POLYA TYPE DISTRIBUTIONS, I 


By Samvuet Karin 
Stanford University 


In a previous publication a specific smoothing property characterizing a class 
of distributions which we called Pélya Type (P.T.) distributions was introduced 
[1]. Most of the standard distributions occurring in statistical practice are of 
Pélya Type. For this class of distributions many of the usual decision theoretic 
questions were analyzed. Explicitly in the case of the two action problem, com- 
plete classes of statistical procedures were characterized and Bayes and admissi- 
ble procedures were also determined. This paper continues the further develop- 
ment of statistical applications for Pélya type distributions. We are still prin- 
cipally concerned with the two action problem. In a subsequent publication the 
n-action and estimation problem for P.T. distributions will be presented. 

Our investigation is divided into three main parts. Part I describes some new 
characterizations of P.T. distribution. Attention is called to Lemma 3 which 
is very useful in establishing the fundamental variation diminishing properties 
of P.T. distributions as described in Theorem 3. Finally, Part I closes with two 
further results about the sums of two random variables one of which has a 
P.T. distribution. 

In Part II we examine in detail many of the standard Neyman-Pearson con- 
cepts for the case when the underlying distributions are known to be Pélya 
Type. Representative topics treated include the principle of unbiasedness, en- 
velope power functions, likelihood ratio tests, etc. Specifically, it is shown that 
in any testing problem uniformly most powerful unbiased tests aiways exist 
and in fact can easily be explicitly constructed. Although we deal here with the 
case where there is only a single free parameter, for many examples a problem 
involving several parameters can be reduced to that of one parameter by using 
the principle of similarity or the principle of invariance. At this point our theory 
can be directly applied. Another interesting consequence of the theory is the 
result that the likelihood ratio test for a composite hypothesis versus a com- 
posite alternative when the underlying family of distributions are of P.T. is an 
admissible test. 

A general minimax theorem for the two action decision problem is developed 
in Part III. Explicitly the game defined by the usual risk function is shown to 
have a value (under very mild conditions imposed on the loss functions). Fur- 
thermore, the optimal strategies for both the statistician and nature are charac- 
terized. Specific attention is directed to the one and two-sided testing problems. 
A discussion of the computational job for obtaining the minimax strategies is 
also given. 


Received November 18, 1955. 


1 Technical Report 1, prepared under contract Nonr-220(16) for the Office of Naval 
Research, Reference No. NR 042-995. 


281 





282 SAMUEL KARLIN 


Although the three chapters are basically related, they may be read sepa- 
rately with only few references to other parts. 


Finally I wish to express my gratitude to Rupert Miller for his help in the 
writing of the manuscript. 


Part I. Definition and Properties of Pélya Type Distributions. 
Sec. 1. Definitions and preliminaries. 


Def. 1. A family of distributions P(z, w) 


P(z, w) = B(w) [ p(x, w) du(z) 


of a real random variable X depending on a real parameter » is said to belong 
to the class @, (Pélya Type n) if 


| p(x, or) +++ p(x, om) 


| 
| p(rm » 1) +++ P(tm, Wm) 

for every 1 S m S nand all 2 < 2% < +--+ < am, and w < we < --+ < wm. 
The family belongs strictly to @, if strict inequality holds in (1.1). u is a o-finite 
measure on the real line and p(z, w) is taken to be continuous in each variable. 
Most of the results can be easily extended to the case where we allow p(z, w) to 
have a finite number of discontinuities of the first kind, in each variable 
separately. 

If the family of distributions P(x, w) belongs to @, for every n, then we say 
that the family belongs to @,, . If it belongs strictly to @, for every n, then it 
belongs strictly to @,, . We shall sometimes say that p(x, w) is Pélya Type n(~ ) 
if P(x, w) belongs to @,(@,, ). 

For n = 1, 2 the conditions of being Pélya Type n reduce to familiar ones. 
p is Pélya Type 1 (strictly Pélya Type 1) if and only if p(z, w) = O(> 0) for 
all x and w. p is Pélya Type 2 if and only if it has a monotone likelihood ratio, 
i.e., for every 2) < 22, [p(x , w)]/[p(axe , w)] is nonincreasing in w. It is strictly 
Pélya Type 2 if and only if it has a strict monotone likelihood ratio, i.e., [p(a , w] 
[p(a2 , w)] is decreasing in w for x; < 2. 

The distributions that can be classified as Pélya Type include almost all of 
the principal distributions occurring in statistical practice. The exponential 
family, the noncentral ¢, the noncentral F, and the noncentral chi-square dis- 
tributions all belong strictly to @,,. For a proof of this the reader is referred 
to [1]. Other examples are given in [2]. The most notable example of a density 
which is not Pélya Type is the Cauchy, i.e 


? 





POLYA II 


a wit 1 
Pee “Sica 

Sec. 2. Some characterizations of Pélya Type distributions. This section will 
be devoted to presenting some alternative characterizations and some analytic 
properties of Pélya Type distributions. Theorem 3 and its corollaries should be 
carefully noted because the decision theory for Pélya Type distributions de- 
veloped alternately in [1] is based almost entirely on this theorem. 


THeoreM 1. If p is Pélya type 2 and the derivatives involved exist everywhere, 
then 


Bios 0 
P(r, w) = P21, w) 


a 
p( x2, w) = p(x2, w) | 


for all w and all x; < x2, and 


0 ‘ 
p(x, w) — plz, w) 
Ow 
(2.3) >0 


9 


0 a ; | 
= che oh ete 
5g Plt @) 525, Pm #) 


for all w and all x. Conversely, if p(x, w) > 0 for all x and w, (2.3) implies (2.2), 
which in turn implies that p is Pélya type 2. Strict inequality in (2.3) implies 
strict inequality in (2.2) and this implies that p is strictly Pélya Type 2. 
Remark. The requirement that p(x, w) > O in the converse theorem can 
be greatly relaxed by use of a device which will be fully explained in connection 
with Theorem 2 below. 
PRoor. p € @2 implies that for all 2; < 22 and w < w, 


| p(a1 ’ we) spss p(x , @) 
' x ee 
l p(2 » w) p(x ’ we) | Pi 1a) G2: — Wi 


wa — 1) (x2, wi) p(r2, we) Loa we 
25 1 


0. 


| 
p(X2, we) — p(x2 » wi) 
@2— @ 


The limit as ow: approaches w, gives (2.2). Also, (2.3) is obtained from (2.2) 
analogous to the preceding by operating on columns. 

The converse is established by showing that p has a monotone likelihood 
ratio. Indeed, (2.3) can be written as 


a {2 ni, 0) 
(2.4) (p(x, w)]? — 4 da? > > 0 
Ox “ole. a) 
\ p(a, w) 
for all x and w. This implies that (0/dw)p(x, w)/p(x, w) is nondecreasing in x for 
all w. This yields (2.2) which in turn implies 





SAMUEL KARLIN 


y’] ( p(2e, a 
1,0 > 0 
[p(a1, w) 15 eae 
for all 2; < a2. (2.5) implies [p(x , w)]/[p(2 , w)] is nondecreasing in w for all 
% < 2X2 ;i.e., p has a monotone likelihood ratio. 
The strict converse is obtained by replacing = by >, nondecreasing by 
increasing, and monotone by strictly monotone in the preceding paragraph. 


Coro.iary 1. Suppose (d°/dxdw) log p(x, w) exists and p belongs strictly to 
?,. Then p belongs to . if and only if 


a 
ai z,w) 20 
0xdw log p(x, w) 2 


for all x and w. 
Our attention is now directed to consider a generalization of Theorem 1 for 
density functions which are Pélya Type of arbitrary degree. 


THEOREM 2. If p is Pélya Type m and all the derivatives involved exist every- 
where, then 


an—1 


9 a | 
| p(x, w) 5, Pit w) ++ — p(x, , ) | 


o n—1 
j 
| p(n, w) < P(tn,w) °° <a — P(rn; “) 
for alln S m, w, and 1%, < 2% < +++ < a, and 


n—1 


0 @ 
p(x, w) ao p(x, w) eee omens p(x, w) 


5 


a a 
— )  <ee OP ey) 
ag PM) 555, PH @ 


an—l 3” 


p(x, w) - p(x, w) -:> 


lax" dx" dw 


implies that p is strictly Polya Type m and strict inequality in (2.7) for all .é 
n S m implies the same in (2.6). 
Proor. We need the following iemma. 


for alln S m, w, and x. Conversely, strict inequality in (2.6) for every1 Sn Sm 





POLYA II 285 


Lemma 1. I/f fi, fe, --+ , fn are differentiable real-valued functions on the real 
line and & < & , then there exists aE, & < E < &, such that 
ay * Ai fil&) filés) 1,343 °°* Ain 
a 


Ong SnlEs) fnlEe) Gnigas °° * 


* Hj fil&) hi Gsgse °°* 


= (& — &)| - 


| ni _—* Sn(Es) f.®) Q,j4+3 °** Gnn| 
where the a;;’s are any real numbers. 

The proof of this lemma is an easy application of the mean value theorem 
and will be omitted. 

The proof of the first part of the theorem proceeds as follows. Let pw) = 
p(z;, ) and pi(w) = (d*/dw")pw). Suppose n and z, < 2. < --+ < 2,» are 
given. For a, < we < << e 


| pr(wr) 28 Dil») 


(2.8) 0 S sgn) 


. Dnlwn) 


| pi(ar) pi(we) pi(ws) pi(wa) - 


| Pa(wr) Pn(w2) Paws) Palwa) --- Dn (wn) | 
where wi; S w; S w; '. This equality is obtained by repeated application of 
Lemma 1. Sgn is the function which equals +1 if its argument is positive, —1 
if its argument is negative, and 0 if its argument is zero. Letting w, — a, , w; > 
Wy, °** , @, — w, the last determinant approaches the determinant in (2.6) 
and therefore (2.4) must hold. (2.7) is derived from (2.6) by applying the same 





286 SAMUEL KARLIN 


operations on the rows of the determinants in (2.6) as were applied to the col- 
umns of the first determinant in (2.8). 
The proof of the converse of this theorem depends on Lemma 2 below. 


Lemmas 3 and 4 which will be needed subsequently in proving Theorem 3 are 
also included at this point because of the similarity of their proofs with the 
proof of Lemma 2. 


Lemma 2. Jf all the derivatives involved exist and are continuous and 


pi(w) pi(w) --- px *(w)| 


Dn(w) pr(w) ee pr (w) 
for all w and n S m where p’; is the jth derivative of p; , then 


| py (wr) oo pi(wr) | 


} 
| 
| 


~ 


| 


| pa(wr) ony, Pn(Wn) | 
s 


> v 


for allw,; < --- <w,andallin S m. 

Proor. The proof proceeds by mathematical induction. Clearly the lemma 
holds for m = 1. Suppose it holds up to m — 1, and suppose w; < w. < --- < 
W, are given. Let gi(w) = pi(w)/pi(w). Then 

. RR views l 
Pi(wi) +++ pr(wr) | 
|q2(wi) ++ ge(wn) | 


sgn | 


pi(wi) +--+ pr(w) 
Qn(W1) --- Gn(Wn) 


0 

| 

| go(ws) 92(ue) 
sgn | 

| 


| @n(Wy) Qn(Us) iti 





POLYA II 


where w, < ts < W. < Us < +++ < Un < Wy. But 


d’ fiw) _ 1 ) Bi 
dwi p(w) p(w) ioftu w) + > ax(w) = f(w) 


where the a,(w)’s do not depend on f. Therefore, for all w 


| pi(w) pi(w) +>: py *(w) | gi(w) O «+. 0 
p(w) p(w) --- ps *(w) | po(w) q:(w) +++ gz (w) 


pa(w) p(w) --- px *(w) | | pa(w) qn(w) «++ gn '(w) | 


Since the first determinant in (2.12) is positive by assumption, for all w and 
nsm 


q:(w) --: q:*(w) 


ly 1 
Qn(w) --- dn (w) 
By the induction assumption this implies 


1 1 
q2' Mey °° q2(Un) 


Qn(U2) *** Qn(Un) | 
for all u,< --- < u, and alln S m. But this determinant equals the last deter- 
minant in (2.11) so (2.10) follows. 
Lema 3. If p is strictly Pélya Type ~ and all the derivatives involved exist 
and are continuous, then 
sgn det |! p(x; , ws) || = sgn gn(t, Wag, On, *** , 1), 


- < 2Xn4,, for some appropriate £ satisfying z; < — < 2a4i, 


(d p(é, w) | 
d jaz pe, o)\ 
dx | d p(é, os 


\dx pl, w). 


m(é, w, wr) » Qalé, w, we, a) = 


2 Pew) 





SAMUEL KARLIN 


ge (é, W, We, Wk-1, *"**%y w) 


> 


dz \ qu-s(E, wi, Wana, *** , 1) 


where w, < we < +++ < wabut wand w+, are allowed to occur anywhere. (The nota- 
tion means that the derivatives are taken with respect to x and evaluated at x = £.) 
Proor. The proof proceeds by induction. Let p(z;, w;) = p.(x;). Forn = 1 


t FS -_ ) 


= 880 lpa(2s) pa(za)| = 98") poles) d pa(e)| 
Pi(z1) Pi(ze) | pr(a1) dx gr(é) 


| pr(ai) pr(2e) 
sgn | 
po( x1) po( x2) 


by Lemma 1. Assume the theorem is true for n — 1. Then 


l 0 0 


| prt) -** prltn4s) be qilé2, we, on) +++ grlEnsi, we, wr) 
(21 


= sgn 


| 
Pn+i(21) res Pn+i(Ln41) 
Pn+i(X1) 


pr(x1) 





gr(E2 » @n4s , 1) di qué n+ Wnt » 1) | 


qi(é2, we, 1) nee qilEnti, 2, 1) | 


qué » Wn41, Wi) ** q(t ns 1, Wn41, 0) | 
where z;-, <  < 2;,i = 2,---,n-+ 1. By the induction hypothesis 
q(t} , we, 1) > Oforj = 2,---,n +1. 
Therefore dividing each row by the first and applying Lemma 1, 


qu(é2 ,2,01) +++ galEngr, we, or) | 


| 


| 
| 


n(é, Wn+1, w) — qe ns. » On+1, w) | 





POLYA II 289 


q(ts , ws, @2, w) “e's q(En+1, Ws, We, w1) 


qa(és » On+1, W2, ws) Pee qe(E n+ » On+1, W2, w) 


where fi-1 < & < &,i = 3,--+,n + 1. Continuing in this manner and at 
each step using the fact that the first row is positive by the induction assump- 
tion the sequence terminates with the expression gn(£, n41, Wn, *** , @1), 41 < 
— < 2n4,, Which has the same sign as the original determinant. 

£is a real number which occurs between z, and x,4; and depends on 2; , 22, -**, 
Xn41 - Det || p(x; , w;) || will always have the same sign, regardless of the values 
of the z;’s and w,’s, just so long as the same order relation exists between them. 
By the continuity assumptions in Lemma 3, £ can be found to take on any arbi- 
trary real value by varying 2%, 22, -°- , 24:1. Hence we have the following 
lemma. 

Lemma 3a. If the conditions of Lemma 3 are satisfied, then 


sgn det | p(x; ’ w;) I = sgn n(x, Wn+t, Wn, "*** ; #1), 


where 2%, < 2% <+++ < Ung, @1 < we < +++ < wy , and where wns, is allowed 
to occur anywhere, and x is any real number. 

The proof of the converse to Theorem 2 follows readily from Lemma 2. In 
fact, strict inequality in (2.6) for every n < m implies that p is strictly Pélya 
Type m by Lemma 2 with p,(w) = p(x; , w), w playing the role of w in Lemma 2. 
Strict inequality in (2.7) implies strict inequality in (2.6) by Lemma 2 with 
p(w) = (°"/dw*") p(z, w), w being fixed and z playing the role of w in Lemma 2. 

The converse statement in Theorem 2 involves strict inequality in (2.6) and 
(2.7). What can be said if just (2.6) and (2.7) hold? A positive result can be 
achieved if the following slight condition holds. If relations (2.6) and (2.7) are 
valid and if for every w and n S m they hold with strict inequality for some 
a < % < +++ < 2, which may depend upon w, then we can generally still 
prove that p is Pélya Type m by use of the following device. Let 


[FE 66 = 40) ply, «) dutu) 
(2.13) 2(z,0) = —————<———— 


P o(x — u, o) du(u) 


where ¢ is the normal density function with mean 0 and variance o. The measure 
u is chosen so that the integrals exist and are positive with yu possessing positive 





290 SAMUEL KARLIN 


measure everywhere. As o — 0, p.(z, w) — p(x, w) uniformly in any finite interval 


and 


or ‘ oO 
| g(x — u, ao) — plu, w) du(u) 
“ dw c 


dw 


(2.14) “ Do(x, w) —— ~~ >— pz, w), 
7 | d(x u, 7) du(u) 


0 


where we have assumed that (2.14) is valid, i.e., the integral can be differentiated 
inside the integral sign. But” 

| 0 

| Po\21, w) — p(X, w) 

Ow 


l 


{ -oO n 
(| d(x — u, a) du(u)) 


0 
p(uU, w) Ao P\uU1, w) 


du(uy) - du(u,). 


e aml 
oO 0 

Pl\Un, w) — PlUn, w) +++ —— Pun, w) 
Ow dw" : 


Since ¢ is strictly Polya Type ~ for each o, the first determinant in the integrand 
is always positive. By assumption the second determinant is not identically 
zero. Therefore p, satisfies the determinant criterion for strictly Pélya Type m 
densities by Theorem 2, and p is Pélya Type m since det | p(x; , w;) lims+o 
det || p.(x; , wj) 

2 See G. Pélya and G. Szego, Aufgaben und Lehrsatze aus der Analysis, Vol. 1, 
Problem 68. 





POLYA II 291 


This completes the various characterizations of Pélya Type distributions 
that will be given here. The remaining theorems and corollaries summarize the 
main properties of Pélya Type distributions. Theorem 3 and its corollaries are 
crucial to the decision theory in Parts II and III, 


Sec. 3. Basic oscillation theorem for P. T. distribution. The following defini- 
tion is needed to make the concepts in the theorem precise. 

Def. 2: The number of sign changes V(h) of a function h(w) is taken to be 
SUPw;,---w, N(h(w;)), where N(h(w,;)) is the number of changes of sign of the 
sequence h(w;), h(we), --- , A(wn), w; < wiz4.. A point wo is called a change point 
for h(w) if h(w) h(w’) S O whenever w S w S w’ with w ¥ w’ (w, w’ essentially 
near wo) and definite inequality occurs for some specific choice of w and w’ or 
h(wo) h(w) h(w’) S 0 for w < wo < w’. 

THEOREM 3. Let p be strictly Pélya Type ~ and assume that p can be differen- 
tiated n times with respect to x for all w. Let F be a measure on the real line, and let 
h be a function of w which changes sign n times. If 


g(x) = | v@, w)h(w) dF (w) 


can be differentiated n times with respect to x inside the integral sign, then g changes 
sign at most n times and has at most n zeros counting multiplicities or is identically 
zero. The function g is identically zero if and only if the spectrum of F is contained 
in the set of zeros of h. 
Proor. Let w , w, -*- , w, be the change points of h. w, < w. < +++ < wp. 
Form 
dj g(x) \ 


i ' d p(x, w) 
dz \p(a, a)! J ds plz, o,) 1) Pw) 


qa(2, Wn, Wn-1l,» “""y w) , | Qn(Z, &, Wn , aaa , a )h(w) dF (w). 


The function g(x, wn, --~ , 1) is the function g,(x, w, a, , -*- , #:) with p(2, w) 
replaced by g(x). All the above integrands are well-defined since Lemma 3 can 
be applied. 

Suppose for definiteness h(w) > 0 for w < w; and n even. Then det || p(x; , w,) ||, 
7,7 = 1,2,---,n+ 1, with w,4; = w, and 2 < r2 < --+ < 24; has the same 
sign as the determinant obtained from the above with first and last rows inter- 
changed. This last determinant is positive as p is assumed to be Pélya Type ~. 
Hence, by Lemma 3 g,(z, w, w,,°*: , 1) > 0 for w < w so the integrand in 
(2.15) is positive. For wo, < w < w, with w = wp4; the original det || p(x; , w,) || 
has the opposite sign of the determinant which has the last row inserted between 
the first and second rows in this determinant. This second determinant is pesitive 





292 SAMUEL KARLIN 


so det || p(x; , w;) || < 0. Applying Lemma 3 again we see that the integrand is 
positive for a, < w < w,. Repeating this line of argument we find that h(w) and 
Qn(X, W, Wn, *** , @) have the same sign so the integrand in (2.15) is always 
positive. Therefore g%(z, wa, Wa-1,°**, #1) > O for all xz. Now, qn 
(@, Wn-1, Wn-2,°** , @) is likewise positive for all z, by Lemma 3. Since qn 
(2, On, ++, @1) > Oand gnra(z, wat, +++, 1) > O for all x, from the definition 
of gr—i(x, Wn1, *-* , #1) we deduce that this function changes sign at most once 
and has at most one zero. Similarly, since gn—2(Z, wa-2,**: , w:) > O for all z, 
this implies that g%-2(z, w,2, --* , #:) changes sign at most twice and has at 
most two zeros counting multiplicities. The end of this sequence of implications 
is that g(x) changes sign at most n times and has at most n zeros counting 
multiplicities. 

Suppose h(w) > 0 for w < w, , but n is odd. Then reasoning analogous to that 
used in the even case shows that the integrand in (2.15) is always negative. 
Thus qx(x, w, a, -** , #1) < 0 for all z, and this implies the desired conclusion. 
A similar argument proves the result when h(w) < 0 forw < w. 

By following the sequence of implications in reverse order it can be checked 
that if g changes sign n times, then it changes sign in the same order as h(w). 
This gives us Corollary 2. 

CoroLuary 2. If the number of sign changes of g is n = V(h), then g and h 
change signs in the same order. 

Coro.uary 3. If p is Pélya Type ~ but not strictly so, the results of Theorem 3 
still hold if for any n and any prescribed w, < --+ < wp, there exists a set of 2, < 
- < 2, (which may depend on w , -++ , wn) such that det || p(x; , w;) || > 0. 

This can be established by approximating p(x, w) by p.(x, w) as in (2.13). 

The condition that p be strictly Pélya Type ~ can be weakened also in another 
manner different from Corollary 3. The results of the theorem still hold if p 
is strictly Pélya Type n + 1, one more than the number of sign changes of h. 

Completely analogous results can also be proved about the function 


(uw) = | plz, a)h(x) dua). 


Sec. 4. Addition theorem for P. T. distribution. The following two theorems 
present results which are interesting per se but which will not be of any use in 
the subsequent sections. These theorems illustrate some other nice smoothening 
properties possessed by Pélya frequency functions. 

THeoreM 4. Let X and Y be independent real random variables having con- 
tinuously differentiable densities f and g, and let f(x — w) = f*(x, w) be strictly 
Pélya Type ~. If g has k modes, then the density of z = x + y, 


hada [ ” £(g(2e — 2) dt, 


has at most k modes. Furthermore, for any constant c,h — c changes sign no more 
often than g — c. 





POLYA II 293 


To expedite the discussion we assume that differentiation can be performed 
underneath the integral. 
PROOF. 


1 ” « 
Sie = [ foge-da= [ fe - vow ay. 


Since the number of modes of a density is bounded above by the number of 
changes of sign of its derivative, the first conclusion follows from Theorem 3. 
The second conclusion also follows from Theorem 3 since 


he) c= [ fle -—0 —adat= | fle — Who) — day. 


Z. W. Birnbaum calls a real random variable X less peaked than another real 
random variable Y if Pr {|X| Ss u} S Pr{|Y| Ss «} forall u > 0. He proves 
that if X is less peaked than Y and Z is independent of X and Y and has a sym- 
metric unimodal density, then X + Z is less peaked than X + Y (Ref. [3)). 
We can generalize this definition and with the aid of Theorem 3 generalize the 
result. 

Def. 3. A real random variable X is less peaked of order n than another real 
random variable Y if q(u) = Pr {|X|s u} — Pr{|Y{|sS u} changes sign 
n times and is nonpositive for sufficiently large u or else changes sign less than 
n times. 

Birnbaum’s definition of less peaked corresponds to less peaked of order 0. 

TueroreM 5. Let X be less peaked of order n than Y. If Z is independent of X and 
Y and has a density h which is symmetric and is such that h(z — w) = h*(z, w) is 
strictly Pélya Type ©, then X + Z is less peaked of order n than Y + Z. 

Proor. If F and G are the cdf’s of X and Y respectively, then 


Pri |X+2Z|s u} —Pr{|Y+Z]| su} 
= / [F(s) F(—s) — G(s) + G(—s)Jh(u — s) ds. 


The first factor in the integrand is an odd function of s which changes sign at 
most n times for positive s and hence at most 2n + 1 times altogether. The 
second factor is a symmetric, strict Pélya Type ~ density function. By Theo- 
rem 3 the integral is an antisymmetric function of u which changes sign at most 
2n + 1 times and hence at most n times for positive u. Furthermore, if it changes 
sign n times for positive u, it changes 2n + 1 times altogether and must there- 
fore have the same sign for very large u that F(s) — F(—s) — G(s) + G(-—s) 
does for very large s. 


Part II. Application of Pélya Type Distributions to Classical Results of the 
Neyman Pearson Variety. 


Sec. 1. Preliminaries. A number of classical results can be derived when the 
underlying distribution is Pélya Type. These results concern Type A regions, 





294 SAMUEL KARLIN 


uniformly most powerful tests, unbiased tests, the likelihood ratio test, etc. They 
unify and strengthen essentially all previously known results. A great deal of 
the literature on the theory of testing statistical hypotheses deals with special 
cases [4], whereas this approach is of a more general nature and yields much 
stronger results and at the same time constructive methods in determining the 
specialized tests. 

The general situation we are dealing with is that of testing a null hypothesis 
against its alternative hypothesis, i.e., a 2-action problem. The parameter space 
Q is the real line. There exist two measurable loss functions L; and Lz on 2 where 
L{w) is the loss incurred if action 7 is taken and w is the true parameter point. 
The set in which L;(w) < Le(w) is the set in which action 1 is preferred when w 
is the true state of nature, and the set in which Le(w) < L;(w) is the set in which 
action 2 is preferred. The two actions are indifferent at all other points. We shall 
assume that L, — L, = h changes sign exactly n times where n will vary according 
to the problem we are considering but will remain constant within each problem. 
The points where L,; — L, changes sign are assumed isolated and are w}, w2, 

- , wn. For the sake of definiteness we shall assume that L;(w) — L2(w) is 
positive for w < w;. Two successive w;’s may be equal but not more than two. 
In fact, if wy = we41, then [Zi(w) — Le(w)] [Li(w’) — Le(w’)] > O for w < wy < w’ 
(w, w’ near w;) and [L;(w) — Le(w)]. [La(ws) — Le(w?)] < 0 for the same choice 
of w. This corresponds to the case where one action is preferred in a neighborhood 
of w; except fer w = w; where the other action is preferred. 

Let @ be a randomized decision procedure. ¢ is a measurable function on the 
real line, and ¢(z) is the probability of taking action 2 (accepting the alternative 


hypothesis) if z is the observed value of the real random variable X. (z is usually 
a sufficient statistic based on several observations.) Consider decision procedures 
¢ of the form 


11 for to9 << % < Xeogas, 


OF) * \y tam oO Sy 


\0 elsewhere 


[a] denotes the greatest integer S a. x» = — ~~. All randomized decision proce- 
dures of this form will be said to belong to the class SW, of monotone procedures. 
If the x,’s are all distinct, then action 2 is preferred in n intervals, action 1 in 
n or n — 1, and at n points there is possible randomization. Strategies with 
fewer intervals but essentially the same form also belong to MM, ; this corre- 
sponds to the case where the z,’s are not all distinct. 

The following theorem and lemma will be used in the subsequent discussion. 
For proofs and greater detail the reader is referred to [1]. It should be remarked 
that the proofs of Theorem 6 and Lemma 4 can be based essentially on Theorem 3. 

THEorEM 6. If p(x, w) belongs strictly to Ons , then for any randomized decision 





POLYA II 295 


procedure @ not in IM, there exists a unique ¢ in M,, such that p(w, @) S p(w, >) 
with inequality everywhere except for w = wo, w2,-** , Wn. pis given by p(w, d) = 
f( — o(x))Li(w) + o(x) Le(w)] p(x, w) du(x). Moreover, the set M, constitutes 
a minimal complete class of strategies. 

If the underlying distribution p(x, w) does not strictly belong to @,4; , then 
the strategies of SI, still constitute a complete class but the uniqueness and 
minimality conclusion of Theorem 6 is not valid in general. However, by a 
general device of approximating non strict Pélya Type distributions by strict 
Pélya Type (see Part 1), many of the foregoing results can be extended. This 
shall be left as an exercise for the reader. 

Lemma 4. If ¢' and ¢ are two strategies in M, and p is strictly Pélya Type 
n + 1, then 


| (¢'(z) — $%(x)]p(z, w) du(z) 


has less than n zeros counting multiplicities. 
In the future, when we say assume strictly Pélya Type n, we mean that the 
underlying distribution belongs strictly to @, . 


Sec. 2. Uniformly most powerful one-sided tests. The case of uniformly most 
powerful tests for the classical exponential family of distributions and other 
specific examples was treated in Lehman’s notes [4]. This represents a slight 
extension to the situation of Pélya Type distributions. 

Assume strictly ®, . For a one-sided testing problem, a uniformly most power- 
ful level @ test exists. 

A one-sided testing problem occurs when 


(1 wo < w@ 


Ly(w) = < and = = L.(w) = 
\0 w= w 


for some w . Then p(w, ¢) = f¢(x)p(z, w) du(x) for w = w and p(w, ¢) = f 
(1 — $(x)) p(z, w) du(x) for w < w . Consider the function f;(w) = f¢(x)p(x, w) 
dyu(x) where ge IN; . f5(w) — ¢ = J(o(x) — c) p(x, w) du(x) where c is an arbitrary 
positive constant. Since ¢ ¢ 3%, , @ — c changes sign at most once and in the 
direction from + to — if at all. Therefore by Theorem 3 and Corollary 2, f, — ¢ 
changes sign at most once and in the same direction if at all. This implies that 
fs is a monotone decreasing function of w. Consider that unique monotone test 
¢* (unique [u]) for which fy+(a) = f¢*(x)p(x, w:) du(x) = a. For any other 
level a monotone test ¢; the corresponding f,, is uniformly smaller than f,. by 
Lemma 4 so that ¢* is best among the monotone tests. Now consider any non- 
monotone level a test ¢. By Theorem 6 there is a unique monotone test ¢2. which 
is better than ¢ except at w, , where equality holds. But since fy. = fy, , ¢* also 
improves on ¢. 





294 SAMUEL KARLIN 


uniformly most powerful tests, unbiased tests, the likelihood ratio test, ete. They 
unify and strengthen essentially all previously known results. A great deal of 
the literature on the theory of testing statistical hypotheses deals with special 
cases [4], whereas this approach is of a more general nature and yields much 
stronger results and at the same time constructive methods in determining the 
specialized tests. 

The general situation we are dealing with is that of testing a null hypothesis 
against its alternative hypothesis, i.e., a 2-action problem. The parameter space 
Q is the real line. There exist two measurable loss functions L; and L, on 2 where 
L{w) is the loss incurred if action 7 is taken and w is the true parameter point. 
The set in which L,;(w) < Le(w) is the set in which action 1 is preferred when w 
is the true state of nature, and the set in which Lo(w) < L,(w) is the set in which 
action 2 is preferred. The two actions are indifferent at all other points. We shall 
assume that L,; — L, = h changes sign exactly n times where n will vary according 
to the problem we are considering but will remain constant within each problem. 
The points where L, — Ll, changes sign are assumed isolated and are wr, we, 

- , wx. For the sake of definiteness we shall assume that L,;(w) — L2(w) is 
positive for w < w;. Two successive w;’s may be equal but not more than two. 
In fact, if w; = wt41, then [Zy(w) — Le(w)] [Li(w’) — Le(w’)] > O for w < wy < w’ 
(w, w’ near w;) and [L;(w) — Le(w)}. [Li(w) — Le(wt)] < 0 for the same choice 
of w. This corresponds to the case where one action is preferred in a neighborhood 
of w; except for w = w; where the other action is preferred. 

Let @ be a randomized decision procedure. ¢ is a measurable function on the 
real line, and ¢(z) is the probability of taking action 2 (accepting the alternative 
hypothesis) if z is the observed value of the real random variable X. (z is usually 
a sufficient statistic based on several observations.) Consider decision procedures 
¢ of the form 


\lfor2;<2< Xoi4i, 


Of) = \, sega OSS 1, 


\0 elsewhere 


[a] denotes the greatest integer S a. x) = —~. All randomized decision proce- 
dures of this form will be said to belong to the class 9M, of monotone procedures. 
If the x,’s are all distinct, then action 2 is preferred in n intervals, action 1 in 
n or n — 1, and at n points there is possible randomization. Strategies with 
fewer intervals but essentially the same form also belong to MM, ; this corre- 
sponds to the case where the z,’s are not all distinct. 

The following theorem and lemma will be used in the subsequent discussion. 
For proofs and greater detail the reader is referred to [1]. It should be remarked 
that the proofs of Theorem 6 and Lemma 4 can be based essentially on Theorem 3. 

THEoreM 6. Jf p(x, w) belongs strictly to Pn41 , then for any randomized decision 





POLYA II 295 


procedure @ not in IW, there exists a unique ¢ in M, such that p(w, ¢) S p(w, >) 


with inequality everywhere except for w = w , w2, -** , &n. pis given by p(w, d) = 
fl — o(x))Li(@) + o(x) Le(w)] p(x, w) du(x). Moreover, the set IM, constitutes 
a minimal complete class of strategies. 

If the underlying distribution p(x, w) does not strictly belong to ®,4: , then 
the strategies of SI, still constitute a complete class but the uniqueness and 
minimality conciusion of Theorem 6 is not valid in general. However, by a 
zeneral device of approximating non strict Pélya Type distributions by strict 
Pélya Type (see Part I), many of the foregoing results can be extended. This 
shall be left as an exercise for the reader. 

Lemma 4. If ¢' and ¢ are two strategies in M, and p is strictly Pélya Type 
n + 1, then 


| [o'(x) — $7(x)]p(z, w) du(zx) 


has less than n zeros counting multiplicities. 
In the future, when we say assume strictly Pélya Type n, we mean that the 
underlying distribution belongs strictly to @, . 


Sec. 2. Uniformly most powerful one-sided tests. The case of uniformly most 
powerful tests for the classical exponential family of distributions and other 
specific examples was treated in Lehman’s notes [4]. This represents a slight 
extension to the situation of Pélya Type distributions. 

Assume strictly ®, . For a one-sided testing problem, a uniformly most power- 
ful level @ test exists. 

A one-sided testing problem occurs when 


(1 wo < wy 


Li(w) = 4 and = L.(w) = 


\o w 2 w 1 wz w 


(1 — $(z)) p(x, w) du(x) for w < w, . Consider the function fy(w) = J¢(x)p(z, w) 
dyu(x) where ge IN; . fs(w) — ¢ = J(o(x) — c) p(x, w) du(x) where c is an arbitrary 
positive constant. Since ¢ ¢ 9%, @¢ — c changes sign at most once and in the 
direction from + to — if at all. Therefore by Theorem 3 and Corollary 2, f, — c 
changes sign at most once and in the same direction if at all. This implies that 
fs is a monotone decreasing function of w. Consider that unique monotone test 
¢* (unique [u]) for which fys(a1) = Jo*(x)p(xz, o:) du(z) = a. For any other 
level a monotone test ¢; the corresponding f,, is uniformly smaller than f,. by 
Lemma 4 so that ¢* is best among the monotone tests. Now consider any non- 
monotone level a test ¢. By Theorem 6 there is a unique monotone test ¢. which 
is better than ¢ except at w, , where equality holds. But since fs. = f,, , ¢* also 
improves on ¢. 


for some w . Then p(w, ¢) = f¢(zx)p(z, w) du(x) for w = w and p(w, ¢) = f 





296 SAMUEL KARLIN 


Sec. 3. Nonexistence of uniformly most powerful two-sided tests. Assume 
strictly ®;. For the two-sided testing problem uniformly most powerful level 
a tests do not exist in general. We discuss this now in greater detail. 

A two-sided testing problem is determined by 


0 ma Sosu 1 
Li(w) = and L2(w) = 


1 elsewhere 0. elsewhere 


Sw we 


for some w; S uw. By virtue of Theorem 6 we can restrict our consideration 
exclusively to monotone tests, i.e., tests in IN. . Let ¢; be a monotone test and 
fe;(w) = os(x) p(x, w) du(x) be the corresponding power function. Consider the 
one-sided testing problem obtained from the two-sided problem above by 
removing one tail. 


= fl 


Li(w) = and L?(w) = ‘ 
lw<w lo o <a 
The existence of a u.m.p. level a test ¢* for this problem was shown in section 2. 
Suppose ¢; z 9, . Then fz+(w) > f,,(w) for w < w, and ¢; is not u.m.p. for the 
two-sided test. The strict inequality is assured by Theorem 6. Suppose ¢; ¢€ 9%) . 
Then f,, is monotone decreasing which means that for w > w., the test ¢ = a 
is better. 

A word should be said about what happens when p is not strictly Pélya Type 3. 
When w; = w and P(z, w) is the rectangular distribution on the interval 0, ], 
a u.m.p. test exists. The acceptance region is [x’, w,] where z’ = aw,. When 
P(x, w) is the rectangular distribution on [0, w] but if a, < w,, no u.m.p. test 
exists; i.e., when the null hypothesis is an interval no u.m.p. test exists. The 
rectangular distribution on [0, w] is Pélya Type 3, but it is not strictly so. It 
even satisfies the condition that for every w, < w. < w; there exists a set 2, < 
Ze < 23 such that det |! p(x; , w;) || > 0. Thus the condition of strictness in this 
result seems very essential. 


Sec. 4. Uniformly most powerful unbiased tests. 

(a) Assume strictly ®;. For a two-sided testing problem a u.m.p. unbiased 
test exists. For this special testing problem the result is known for scattered 
examples. 

A test ¢ is unbiased if and only if f,(w) S @ for a S w S w, and f,(w) = a 
for w S w, and w 2 w:. Consider the test ¢ = a. By Theorem 6 there exists a 
unique test ¢* ¢ IN, which uniformly improves in terms of risk over ¢ = a except 
at w, and w. . Clearly ¢* is unbiased. ¢* is determined by 2} , 2 , 47. and A? which 
are the values satisfying [¢*(x) p(x, w;) du(x) = a, i = 1, 2, where w < w and 
ti , 2, A1., and A} , determined satisfying So*(x) p(x, o) du(x) = a and (d/dw) 
So*(x) p(x, w) du(x) |, = Oif wo = w.. When w, = w the null hypothesis con- 





POLYA II 297 


sists of a single point. Lemma 4 shows that fixing the value of f+ at two points 
or fixing f,+ and its derivative at one point is sufficient to determine ¢* uniquely. 
Suppose there were an unbiased level a test ¢ for which f,(w) > fse(w) for some 
@ < w Or w > w . Then there would have to be a monotone test ¢, which improves 
on ¢ except at w, and w,. But this contradicts the fact that ¢* is the unique 
monotone test uniformly better than ¢ = @ except for w, and w, . Thus ¢* is the 
u.m.p. unbiased level @ test. 

(b) Assume strictly @,4.. For any preference pattern for the two-action 
testing problem say involving n + 1 distinct regions where action 1 is favored, 
a u.m.p. unbiased test exists. 

The above argument generalizes easily to any preference pattern. The unique 
test ¢* ¢ IN, which is uniformly better than ¢ = a except at w , we, ++: , Wa is 
the u.m.p. unbiased level a test where w; corresponds to the change points of 
L, — L,. $* is uniquely determined by solving the system of equations f{¢*(z) 
p(x, w:) du(xz) = a,i = 1,2,--+,nforzt,z2,°*+,tm,AL,A25°°*,An- For 
the case where w; = w;4; for some 7 replace the equation f¢+(x) p(x, wis:) du(x) = 
a by (d/dw) f¢*(x) p(x, w) du(x)|.,; = 0. Lemma 4 shows that the system of 
equations in this latter case is still sufficient to determine ¢* uniquely. 


Sec. 5. Generalization of unbiased tests. Assume strictly @,4;, and assume 
the two-action testing problem under consideration involves n + 1 preference 
regions. Let ¢° be an arbitrary but fixed test. There exists a test u.m.p. with 
respect to the class ®,, of all tests which improve on ¢’. 

This generalizes the concept of unbiased tests because the class of unbiased 
tests can be defined as the class of all tests which improve on the test ¢ = a. 

There exists a unique monotone test ¢* which improves on ¢. fee lies above 
fo in those intervals in which action 2 is preferred to action 1 and it lies below 
in the other intervals. Some intervals may be degenerate and consist of a single 
point. Two tests in 9M, cannot improve on ¢° since they both must have the 
same power as ¢° at w,, w2,--*, w, and this is impossible by Lemma 4. Any 
nonmonotone test improving on ¢° has a monotone test improving on it by 
Theorem 6, and this monotone test must be ¢*. 


Sec. 6. Nature of Type A critical regions. Assume strictly @; and assume that 
for any power function differentiation inside the integral sign with respect to 
w is valid. For testing a single point w; against all alternatives a Type A region 
can be characterized as the union of at most two semi-infinite intervals, i.e., 
its complement is a single interval. 

A Type A critical region is the critica] region for any test ¢ which maximizes 
the curvature {¢(x) (0°/dw’) p(x, w) |, du(x) subject to the constraints [¢(x) 
p(x, w:) du(x) = aand f(x) (8/dw) p(z, w) | «, du(x) = 0. We know from section 4 
that a u.m.p. unbiased level a test exists. In fact it is the unique test ¢* ¢ MN, for 
which f¢*(z) p(x, w:) du(x) = 0 and (0/dw) f¢*(x) p(a, w) du(x)|o, = 0. By 
Theorem 6 any nonmonotone test satisfying the constraints has a unique mono- 
tone test improving on it and this test must be ¢* because of the uniqueness. 





298 SAMUEL KARLIN 


Thus ¢* defines the Type A critical region, and since ¢* belongs to SM the critical 
region is the union of at most two semi-infinite intervals. 

Two remarks should be made. First, all of the known Pélya Type distributions 
can be differentiated inside the integral sign. Second, for general distributions 
it is not true that a Type A critical region is the union of two semi-infinite 
intervals. 


Sec. 7. Type A critical regions as a function of the level of significance. Assume 
strictly ®; and assume that every power function can be differentiated inside 
the integral sign with respect to w. Further, suppose p(z, w) > 0 for all z and 
(0/dw) p(x, w:) is continuous in x. The assertion is that the complement of the 
Type A critical region for testing , against all alternative at level a contains 
the complement of the Type A critical region for testing w; at level a’ whenever 
a < a’. In other words, whenever the hypothesis is rejected for the level of sig- 
nificance a, then it should be rejected also for level a’ where a’ > a. 

This property is not true for Type A regions in general. (See [5].) In order to 
establish the above result, we need to use the following lemma. 

Lemma 5. Assume @.. If p(x, w:) > 0 for all x and (0/dw)p(x, w) exists and 
has at most isolated zeros, then there exists an x» such that 


= 0 «Lc < XL 


xr> XH 


Proor. By Theorem 1 


q) 
p(x, w1) — p(x, w) | 
Ow 


(3.1) =0 


| p(ae, or) a p(x2, «1) | 
Ow 
for x; < 22. If (0/dw) p(x, w,) has no zeros, then the lemma is true with z» = 
+ «x. If (@/dw) p(x, w;) has zeros, take xz» to be any one of them. Choose z; = 
x < Xo = 2. (3.1) reduces to — p(zo , w:) (0/8w) p(x, w,) = 0. Since p(x , w:) > 0, 
we deduce that (0/dw) p(x, w,) S O for x < x. Now, select 1; = m < 2 = %. 
(3.1) reduces to p(xo , w:) (0/dw) p(x, w,) 2 0. Since p(x , w:) > 0, (0/8w)p(x, w) 
> Oforz > xX. 
From section 6 we know that the Type A critical region for level @ is given 
by a test ¢. € IM. . For simplicity of exposition let us suppose that no randomiza- 
tion is involved and that ¢, is defined by the points xz; and x, which must satisfy 


/2) 


a eB 
[ p(x, wo) du(x) = 1 — a and I x p(x, w) du(x) = 0. 
ad | 


We assert that x; < x < 22, for otherwise the integrand 2? (0/dw)p(z, w,) du(x) 
would be of one sign. (The hypothesis implies that (0/dw)p(x; , #:) is not identi- 
sally zero in the interval (2; , x2).) Let zr; and zs be the two points defining the 





POLYA II 299 


level a’ test, where 7} < 2» < 22 with a’ near a (a’ > a) so that x; and 2; are 
near x; and z, respectively. Clearly the first constraint prevents the interval 
(z;, 22) from containing the interval (x, , 22). Suppose z; = 2, and x, > a. 
Subtracting the second constraint for x; , x, from the second constraint from 
21, t2 yields 


z F oH 
(3.2) [2 vee, op) ante) — f° 2 pe, w) dua) = 0. 
Ze ® Zz Ow 


Between z, and x; (0/dw)p(zx, w;) > 0 and between z; and 2; (0/dw)p(x, wi) < 0. 
Hence (3.2) is impossible. A similar argument excludes the case 2; < 2,22 S 22. 


Tn ‘ , mn . *o . 
Thus z; 2 x; and xz, S z.. The reader can furnish the modifications necessary 


for the argument when randomization is required at the end points. 


Sec. 8. Envelope power function. Assume strictly @;,. For the problem of 
testing at level a a single point w against all alternatives the envelope power 
function p(w) decreases monotonically away from w in both directions. Let 
U, be the class of tests ¢ such that if p(w’) > p(w”) where w S w’ < w” orw” < 
w S w, then p(w’, d) > p(w”, ). It will now be established that there exists a 
test u.m.p. with respect to the class U, . 

Theorem 6 shows that in obtaining the envelope power function the only 
tests that need be considered are those in IM, . For w 2 w , p(w) = p(w, d*) where 


¢* is the u.m.p. level a one-sided test of w S w against w > w,. For w S w 
p(w) = p(w, o**) where ¢** is the u.m.p. level a one-sided test of w 2 w, against 
w < w,. Lemma 4 shows that no other monotone test can improve on p(w, ¢*) 
for w > w or on p(w, ¢**) for w < w . Thus the p so defined is actually the envelope 
power function, and from its definition it is clear that it decreases monotonically 
away from w,; in both directions. 

By Section 4 there exists a u.m.p. unbiased level a test. Now any test of IN, 
has the form that the power function can have at most one relative maximum. 
Indeed, it is sufficient to show that the set of points w where f¢(xz) p(x, w) du(x) 
exceeds any given constant 0 < K < 1 consists of a single interval. As ¢(x) — K 
changes sign twice in the order — + —, we deduce that the same holds for 
S(¢(x) — K) p(x, w) du(x) = f¢(x) p(x, w) du(x) — K from which the conclusion 
follows. Any unbiased test in 91%, must therefore also be in U, , and conversely. 
The only other possible competing tests that need be worried about are the 
u.m.p. unbiased level a’ tests where a’ < a. By Section 7 the acceptance region 
for ¢q contains the acceptance region for ¢. . Thus the probability of rejection 
at any w for ¢q’ is at least as small as that for ¢. . Hence p(w, a) = p(w, da) 
for allw ~ w. 


Sec. 9. The nature of the likelihood ratio test. Assume strictly @, , and assume 
p(x, w) is a continuous function of z and w. We prove that the likelihood ratio 
test is a monotone test. 

More explicitly what we mean by the likelihood ratio test being monotone 





300 SAMUEL KARLIN 


is the following. Let the null hypothesis be that w ¢ A and the alternative hypoth- 
esis be that we 2. An@ = ¢and A uQ@ = R’. Suppose A is the union of n dis- 
joint intervals some of which may be degenerate, i.e., points. Then 


( | sup p(z, w) 
weA 
I(c) = 42| _—. > 
| | sup p(a, w) 


\ | wef 


ec 


is the union of at most n disjoint intervals and hence belongs to MN, . 
Consider a point «; in one of the intervals of A. Let J., = 


{xz | p(x, ow) = c supseo p(z, w)}. 


That J, depends on ¢ will be understood. J4, = Nua {x | p(z, w1) 2 ep(z, w)}. 
Since p € @2, [p(x, w:)]/[p(x, w)] is a monotone decreasing (increasing) function 
of x for w > w; (w < w). Thus {x | p(x, w) = ep(z, w)} is a semi-infinite interval 
either to the left or right so J,, is an interval. If w is not a degenerate interval 
of A, consider another point w, (for definiteness w, < we) which is in the same 
interval of Aas a .J., = {x| p(x, w2) = ¢ supsa p(z, w)} is an interval. Either 
I.,, is contained in J,,, , or J,,, contains points of J,, and points greater than those 
in J,, . It cannot happen that J,,, contains points less than those in /,, with- 
out containing all points in J,, . Suppose the contrary that this did happen. 
There then exist two points 2; > 2 such that 2. ¢ 14, , 22 @ Iu, , % ¢€ J, and 
a2I,, . Since 2 ¢1., [p(x2, 2))/[p(z2, w) 2 c for all w in Q, and since z zg /,, 
there exists a w’ ¢ 2 such that [p(2, w)|/[p(z2, w’)] < c. Thus p(x2, we) > 
p(xe, w:). By a similar argument p(x, :) > p(x, we). This gives p(x; , w) 
p(X. , we) — p(t, w2) p(x2, w:) > O which is impossible by assumption since 
a, > x2. Thus the assertion is true. 
The continuity of p(x, w) in both variables simultaneously implies the following 
continuity property between J, and w. The proof is standard and shall be omitted. 
Property 1. Let wo be a fixed point in a nondegenerate interval J of A. For 
every open interval U properly containing /,, there exists an « > 0 such that 
I, is contained in the open interval U for all w ¢ J satisfying | w — wo! < «. 
Consider any nondegenerate interval J = (a, b) ¢ A. It will now be shown 
that U..; J, is an interval. Suppose to the contrary that there is a point z* such 
that x* zU..; J, and there exist J, for w ¢ J above and below z*. Property 1 and 
the fact that if w, < w. J, contains points less than those in J,, only if J., C 
I,, show that the set of w eZ for which J, lies above z* is an open interval if 
b zgI and a half-open interval if b ¢ J. Similarly the set of w « J for which J, lies 
below z* is an open interval if a z/, and a half-open interval if a ¢ J. But if 
a, b zg I it is impossible for the interval J = (a, b) to be the union of two disjoint 
nonempty open intervals. Similar contradictions hold when aeJ, beJ, and 
a, b eI. Therefore U,,.; J. is an interval. The next interval in Q to the right of 
I will produce an interval in the z-space to the right of U,.; J, or including U..; 
I, . Repeating this reasoning we see that the proof of our proposition is complete. 
It has thus been shown that the likelihood ratio test is a monotone test and 
hence admissible by virtue of Theorem 6. 





POLYA II 301 


Part III. Minimax Strategies for Nature and the Statistician in the General 
Two-action Problem. A brief summary of the results already obtained in [2] for 
the one-sided testing problem will first be given for the sake of completeness. 
The parameter space © is an interval (c, d) of the real line. c may be —~, d 
may be +, and the interval may be open or closed at c or d if either or both 
are finite. There is a point w» ¢ (c, d) such that action 1 is preferred for w S wo 
and action 2 is preferred for w = wo. The two loss functions L,;(w) and Le(w) 
are continuous, and L;(w) = 0 for w S wo; > 0 for w > w and L.(w) = 0 for 


w 2 wo; > O for w < wo. The risk function p is given by 
(4.1) o(F,) = ff Ua(u)o(a) + Lalu)(1 — o(2))IpG@, «) dul) dP Co), 


where F is the randomized strategy (a priori distribution) for nature and ¢ is 
the randomized strategy for the statistician. p is assumed to be strictly Pélya 
Type 2. The following two conditions were required: 

Condition 1. If 2 is open at d and a is in the interior of the convex hull of the 
spectrum of u, then as w — d 


L(e) d: Pls, d ated 8: 


Condition 2. If 2 is open at c and 6 is in the interior of the convex hull of the 
spectrum of yu, then as w — ¢ 


La(w) | ” P(x, w) du(z) > 0. 


Under the above assumptions it was shown in [2] that the game G = ({F}, 
{¢}, o) has a value and both nature and the statistician have minimax strategies. 
Moreover the statistician has a monotone minimax strategy and nature has a 
minimax strategy which concentrates at just two points. 


Sec. 1. Minimax theorem in the case when 0 is closed. Our first objective is 
to present the basic minimax theorem for the general two-action problem in the 
case where the parameter space © is closed. We deal with the situation where 
there exists n + 1 distinct intervals arranged in order in which actions 1 and 2 
are successively preferred. When n = 1 then our general preference pattern 
reduces principally to the classical one-sided test of hypothesis. For n = 2, we 
are treating the classical two-sided testing problem. We assume throughout 
that L,(w) and L,(w) are both continuous. The fundamental preliminary mini- 
max theorem becomes: 

TuHEoreM 7. If the parameter space Q is closed, then the game defined by the risk 
function p(F, ¢) is determined (has a value) and the statistician possesses a mono- 
tone minimaz strategy while nature has a minimaz strategy involving at most n + 1 
points of increase, i.e. the nature’s minimax distribution concentrates at most at 
n + 1 points. n + 1 is the total number of disconnected preference regions of both 
actions. 

Proor. As © is closed we know that the space of distributions F over © is 





302 SAMUEL KARLIN 


compact in the weak* topology with respect to the continuous functions on @. 
This is essentially the Helly selection theorem. Also, the space of strategies ¢ 
in the two-action problem is also compact in the weak* topology over the inte- 
grable functions on X. Obviously, p(/’, @) is linear and continuous with respect 
to the appropriate weak* topologies and thus optimal strategies F° and ¢’ exist 
and the game p(F, ¢) has a value (see [6}). 

As the class of all monotone strategies constitute a complete class [1], there 
exists a monotone strategy ¢* which improves uniformly on ¢’ in terms of risk 
and hence ¢* is minimax. Let 


T a {w p(w, ¢*) = MAaXe p(w, ¢*) ™ v}, 


where v is the value of the game. We must distinguish between n odd or even. 
The analysis will be made for n odd and the details for n even are left for the 
reader to supply. Suppose for definiteness that the monotonic strategy ¢* has 
the form 


(1 Yo << LX < NX; 
o*(x) = 4 
lo Yea < < V2i+1 


with z; = —© and 2em4; = + and where the z; are distinct. In other words 
there are 2m disconnected intervais where different preferences of actions 1 and 
2 are desired. Of course, m is limited such that 2m S n + 1 (see Theorem 6) 

We now assert that 7’ meets at least 2m alternate intervals where actions | 
and 2 are successively preferred. Suppose the contrary: let us consider 


(4.2) / p(z, w)[Ly(w) — La(w)] dF(w). 


As F°(w) must concentrate its full measure in 7 and the only sign changes of 
L;(w) — Le(w) occur as we pass from one preference region to another, we infer 
that [L;(w) — Le(w)] dF*(w) changes sign less than 2m — 1 times. Thus, (4.2) 
by Theorem 3 must change signs fewer than 2m — 1 times. However, ¢* is Bayes 
against F° and must therefore take the values +1 or 0 according as (4.2) is 
negative or positive. Thus ¢* cannot have the form as indicated. This contra- 
diction implies the assertion made above about 7’. 

Select 2m points 7* in T each belonging to a different preference region such 
that L,(w) — Le(w) traversing these points changes sign 2m — 1 times. By 
Theorem 5 of [1] there exists a distribution F* which fully concentrates on 7* 
against which ¢* is Bayes. We now show that F* is minimax. As F* concentrates 
in T* C T, we get v = p(F*, $*). Using the Baysian nature of ¢* for F*, we 
obtain 


v = p(F*, o*) S o(F*, 9), for all strategies ¢. 


The proof of the theorem is thus complete. 





POLYA II 303 


Sec. 2. Two-sided minimax theorem. Our next task is to eliminate the re- 
striction that 2 is closed. For this purpose we need to impose some further 
conditions on the family of densities p(z, w). To expedite and clarify the reason- 
ing, we restrict ourselves to the two-sided problem. Similar analysis would 
apply to the general two-action problem. 

Reviewing the basic assumptions, we have that the parameter space © is an 
interval (c, d) of the real line. c may be — ©, d may be +, and the interval 
may be open or closed at c or d if either or both are finite. There exist two points 
@ , w € (c, d) such that action 1 is preferred for w S w, and w = w and action 2 
is preferred for w, S w S we. The two loss functions L,; and L, are continuous, 
and L,(w) = O for w S wm, 2 w;> Ofor w < w < w, and Le(w) = Oforw Ss 
@® Sw; > Oforw < w , w > we. There is no loss of generality in taking the loss 
function equal to zero where the action is preferred as differences of the loss 
functions are the only relevant quantities involved. The risk function is again 
given by (3.1). This time p is assumed to be strictly Pélya Type 3. 

The assertion that will be proven under certain hypothesis of smoothness is 
that the game G = ({F}, {¢}, p) has a value and both players have minimax 
strategies. The statistician has a monotone minimax strategy, and nature has 
& minimax strategy which concentrates on at most 3 points. To establish this 
assertion we impose three conditions: 

Condition A. If Q is open at d and a is interior to the convex hull of the spec- 
trum of », then as w — d 


La(s) [ P(x, «) du(z) +0. 


Condition B. If 2 is open at c and b is interior to the convex hull of the spec- 
trum of yu, then as w—> c 


Le(w) [ Pim, a) deladieetd. 


These conditions require that if either endpoint is open then as w tends to 
this endpoint the mass of probability shifts away from the opposite end of the 
axis in such a way that the probability at the opposite end of the axis must 
tend to zero at a faster rate than the loss L blows up. These conditions are 
similar to those imposed in the one-sided problem. 

Condition C. Let l(c) = lim,.. Le(w) and Ud) = lim,.¢ Le(w) (the existence 
of the limits is postulated). 

(i) 2 = min(U(c), l(d)) > maxs, <w<«, Lr(w) 

(ii) If Q is open at c, then l(c) > maxa<w<» L2(w) for any closed interval 
contained in ©. 

(iii) If 2 is open at d, then l(d) > maxg<w<» L2(w) for any closed interval 
contained in Q. 

Condition C has essentially the effect of eliminating the possibility that nature 
will desire to concentrate at the ends of the parameter space in choosing an 





304 SAMUEL KARLIN 


optimal strategy. This condition is fulfilled, for instance, when the losses tend 
to © at both ends. 

Suppose Q is not a closed interval (the case treated in Theorem 7). Then the 
result is not immediate because the space {Ff} is not compact. In fact it is no 
longer true unless for example the three conditions A, B, and C are imposed 
on the problem; that is, some conditions are necessary. The method of proof 
consists in considering the sequence of games G" = ({F"}, {¢}, p) where QD” = 
[wn , @,] and w, — c, w, > d. That is, we consider a sequence of games defined 
over closed intervals contained in 2 which in the limit approach Q. If one end 
of © is closed, say d, then w, = d for all n. Each game G” has a value, and for 
each game G” the statistician has a monotone minimax strategy ¢” and nature 
has a strategy (distribution F” which concentrates at at most three points by 
Theorem 7. The problem is to show that subsequences can be selected from 
{p"} and {F"} which converge to strategies yielding the desired properties in 
the original game G. 

Let v, be the value of the game G”. The sequence of »,’s is bounded away from 
zero. Indeed, consider the closed interval [w;, w;]. Choose three points w', «’, 
and w such that w, < w < ,a <a < w, and w < w < wi. (We assume 
of course that w; < w, and w, < w;. The game G” has no meaning otherwise.) 
Let F’ be the strategy for nature which plays w’, w’, and w* with equal probability. 
Clearly p(F’, ¢) = a > 0 for all ¢ when a = min(L2(w’), L(w”), Le(w’)). Hence 
v, 2 a > O for all n. 

Let T, = {wiwwn, al, p(w, 6”) = »,}. 7, contains points in both preference 
regions; i.e., 7, contains w in the interval (w, w,) and 7, also meets at least 
one of the intervals (c, w), (w: , d@). Suppose not. Suppose 7, C (w , w:). F” must 
concentrate at points of T, . Let ¢o(x) = 0 for all x. Then p(F”, ¢o) = 0 which 
contradicts the fact that v, > 0. An analogous argument using ¢:(7) = 1 elim- 
inates the possibility that 7’, contain no points of (a , we). 

The monotone minimax strategies ¢” are characterized by two points 2, , Yn 
(a, < Yn). In order to show that two subsequences of {¢"} and {F"} can be 
selected which converge to minimax strategies in the original game G we need 
the fact that the sets 7, are bounded away from the open ends or end of Q. 
This is established by showing that the z,’s or y,’s or both are bounded away 
from c’ and d’, the ends of the spectrum of yu. c’ may be — © and d’ may be +. 

Suppose © is open at c; 2 can be open or closed at d. We assert that c’ cannot 
be a limit of the sequence {z,}. Suppose there were a subsequence {z,,} of {2,} 
with the limit c’. 

Case 1. c’ is a limit point of the sequence {y,,}. 

There exists a subsequence {¢"’} of strategies such that {z,,} and {y,,} each 
have the limit c’. For w < w and 


ad 
w > we p(o"!, w) = Le(w) | P(x, w) du(z), Therefore p(¢"!, w) 0 
tng 


asn;— © forw < w and w > w,. Since », 2 a > 0 for all n, this means that 
T, is totally contained in the interval (w; , w), a contradiction. 





POLYA Il 


Case 2. c’ is not a limit point of {y,,;}. 

It is assumed there exists a y,, such that y,, 2 y. for all n,;. 

Lemma A. Under the conditions that x,, — c’, there exists a limit point of {v,,} 2 
l(c). 

Proor. By condition B given 7 > 0 there exists M(n) such that [%% P(z, w) 
du(x) = (1 — n) for w < M(n). Given e > O there exists w < M(n) such that 
L2(w.) > Uc) — ¢ (or if l(c) = © for arbitrarily large K there exists wx < M(n) 
such that Lo(we) > K). For «o, (or wx) there exists N(w, , <’) such that 


| P(x, w.) dp(x) — | P(x, w.) du(x) < 


_ 


for n; 2 N(w,, ¢’). Therefore, when l(c) < « 


Ve 
p(o"*, w.) = L(w.) f P(x, w.) du(x) = (Ue) — (1 - 9 — €) 
‘é 
for n; 2 N(w., €). But ¢, 9, e are arbitrary constants so the assertion follows. 
When l(c) = ~ p(o" , wx) 2 K(1 — n — 2), and the assertion still holds. 
Lemma B. Under the same conditions as in Lemma A, »,», S l(c) — 8 for all 
n; where B > 0. 
Proor. Let ¢’ be a monotone strategy for which the characterizing points 
z’ and y’ are interior to the spectrum of yg. 


p(w, ¢) = Lilw) If P(x, w) du(x) + [ P(x, w) au(2) 


+ L(w) | P(x, w) du(z). 


z 


Let a = l(c) — mex L,(w). a > 0 and 
Ly(w) If P(x, w) du(x) + [ P(z, w) du(2) < l(c) — a. 


If 2 is also open at d, then by virtue of conditions A and B for every « > 0 there 
exist two constants H(e) and K(e) such that L2(w) f% P(x, w) du(x) < « for 
w < H(e) and w > K(e). For H(e) S w S K(e) the second factor in p(w, ¢’) is 
< l(c) — b for some b > 0 by condition C. Thus p(w, ¢’) S max (I — a, e, 1 — b) 
< l for all w, and v,, S max (l — a, e, 1 — 5) for all n;. If Dis closed at d, by condi- 
tion B there exists H(e) such that L2(w) f% P(x, w) du(x) < efor w < H(e). 
For H(e) S w S d this factor is < 1 — b for some b > 0 by condition C. Again 
Yn, & | — B where 8 = | — max (I — a, ¢,1 — b) > O. 

But Lemmas A and B are contradictory assertions. Hence the original assump- 
tion that c’ was a limit point of {z,} is untenable. 

An analogous argument shows that if 2 is open at d the sequence {y,} cannot 
have d’ as a limit point. 

The required theorem follows almost immediately. 

Lemma C. If Q is open at c and d, there exist two constants C, and C2 such that 





306 SAMUEL KARLIN 


e<C, < C: < dand T, C [Ci, C2] for all n. If 2 is open at c(d) and closed at 
d(c), there exists a constant C\(C2) such that c < C\(C2 < d) and T, C [C,, d] 
({c, C2}) for all n. 

Proor. For w near c and d p(w, ¢”) = Lo(w) J P(x, w) du(x). By the previous 
discussion if 2 is open at both ends there exist constants x,, and yq interior to 
the spectrum of uw such that x. < zx, and y, S zw for all n. p(w, 6”) S Le(w) 
fet. P(x, w) du(x). By conditions A and B there exist constants C,(m) and C2(m) 
such that 


rua’ 


L2(w) | P(x, w) du(x) < 9 


Ze’ 


fore < w < Ci(m) and C:(n) < w < d. Choose 7 = a/2 where a is the bound of 
Val Vn a > O for all n). Thus T, cannot have points below C,; = C;(a/2) or 
above C, = C2(a/2). A similar argument works when © is open at just one end. 

For each game G” = ({F"}, {@}, p) in which @ is open at both ends there is 
a triplet (F", 6”, v,) where F” is the minimax strategy for nature which concen- 
trates at three points, ¢” is the monotone minimax strategy for the statistician 
characterized by two points, and y, is the value of the game. F” concentrates at 
exactly three points (and not at at most three) for it has been shown that ¢” 
defines a split selection region for action 1, and by virtue of Theorem 5 and the 
fact that ¢”" is Bayes against FP” this would be impossible unless F” concentrates 
at three points. A subsequence {¢"*} can be selected which converges to a mono- 
tone strategy ¢* for the statistician. ¢* also defines a split selection region for 
action 1 since the z,,’s and y,,’s are bounded away from the ends of the spec- 
trum of uw. Since the 7’,’s are contained in a closed interval contained in Q, a 
subsubsequence {fF} can be selected which converges to at most a three-point 
distribution, F*, for nature. Finally a subsubsequence {y,,} can be chosen which 
converges to a value v. p(F, 6") S vn», so by the Lebesgue convergence theorem 
p(F, o*) S v, for every F. Similarly p(F™ , ¢) = vn, so p(F*, ¢) = v. Thus » is 
the value of the game, and F’* and ¢* are minimax strategies for nature and the 
statistician respectively. F* concentrates at exactly three points since ¢* defines 
a split selection region for action 1. 


< 
> 


When © is closed at one end, analogous arguments prove the existence of 
minimax strategies ¢* and F* where F* concentrates at at most three points. 

Summing up the foregoing results, we have 

THEOREM 8. /[f conditions A, B, and C are satisfied and no other restriction on 
the parameter space 2 is made, then the game with payoff kernel p(F, o) has a value, 
where optimal strategies exist with the same properties as is given in Theorem 7. 


Sec. 3. Computation of minimax strategies. The previous discussion has been 
an existence discussion, and no mention was made of how the statistician’s 
monotone minimax strategy can be found or constructed. The remainder of this 
section is devoted to giving two general methods for constructing the minimax 
strategy—one for the one-sided problem and one for the two-sided problem. 





POLYA I 
In the one-sided, problem 


| Ly(w) | P(2, w) dy(x), ow 2 
plow, >) = ) 2 

|La(w) | PCa, «) du(z), » < 

\ zo 


where zo is the point characterizing the monotone strategy ¢. As 2» decreases 
Ly(w) §*% P(x, w) du(x) decreases for each w 2 wo and L2(w) fF, P(x, w) du(x) 
increases for each w S wo. The method is now obvious. Choose an arbitrary 
to. If 


i] zo 
max | P(x, w) du(x) < max | P(x, w) du(x), 


@Swo 0 wewe 


decrease x» until the maximums are equal. If the reverse is true, increase Zp . 
That zo which implies equal maximums above and below w defines the monotone 
minimax strategy. There is no danger of the maximums rot existing since by 
conditions 1 and 2, p(w, ¢) ~ 0 as we, d. 

In the two-sided problem 


( zo poo 
Ly(w) if P(x, w) du(x) + | P(z, w) au(2) | 
p(w, @) = vo r, 
|La(w) |” P(e, «) dul) ~< 


where 29, yo are the points characterizing the monotone strategy ¢. Assume 2 
is open at both ends. Choose an arbitrary x . Determine yo (as a function of zo) 
so that 


evo ve 
max Le(w) | P(x, w) du(xz) = max L(w) P(x, w) dy(zx). 
oS zo @2se zo 

This cannot be done for all 2. As w — d, Lo(w) 2, P(x, w) du(x) — Ud) and 
l(d) > max. <<» L2(w) for any closed interval (a, b] C Q so that for yo sufficiently 
large max... <» > Max. <», . Both maximums equal 0 for yo = x so unless max,, <. 
= max.<., , for all yo > 2 there will be equality at some point. max,,<. > 
max.<w, for all yo > 2x when 2 is chosen too close to d’. In this case decrease 2» 
until it is possible to determine yo so that the maximums are equal. There will 
be some point z,, such that for all z» S z,, a yo can be found. Now vary 2» in the 
appropriate direction until the outer maximums are equal to 


i] 


max If ‘ P(x, w) du(x) + 


@1SeSe2 


P(x, w) aut) : 


vo (zo) 


The zp and corresponding yo which give three equal maximums determine the 
monotone minimax strategy for the statistician. 





308 SAMUEL KARLIN 


A further useful fact is that the monotone minimax strategy for the statistician 
is unique. This can be demonstrated with the aid of Lemma 4. 


REFERENCES 

{1] S. Karun, ‘Decision theory for Pélya type distributions; case of two actions, I.” 
Third Berkeley Symposium on Probability and Statistics, Vol, 1, pp. 115-129. 

[2] S. Karurn anv H. Rustin, “The theory of decision procedures for distributions with 
monotone likelihood ratio.’”’ Ann. Math. Stat., Vol. 27 (1956), pp. 272-299. 

[3] Z. W. Branspavum, “On random variables with comparable peakedness.’’ Ann. Math. 
Stat., Vol. 19 (1948), pp. 76-81. 

[4] E. L. Leaman, Notes on Testing Hypothesis. University of California, 1950. 

(5) H. Cuzrnorr, “A property of some type A regions,”’ Ann. Math. Stat., Vol. 22 (1951), 
pp. 472-474. 


{6} S. Karurn, ‘The theory of infinite games,’? Ann. Mathematics, Vol. 24 (1953), pp. 371- 
401. 





ON MINIMIZING AND MAXIMIZING A CERTAIN INTEGRAL WITH 
STATISTICAL APPLICATIONS':* 


By JaGpisH SHARAN RusTaAGI 


Carnegie Institute of Technology 


1. Summary. We consider here the problem of minimizing and maximizing 
J-xe(a, F(x)) dx under the assumptions that F(x) is a cumulative distribution 
function (cdf) on [—X, X] with the first two moments given and that ¢ is a 
certain known function having certain properties. The existence of the solution 
has been proved and a characterization of the maximizing and minimizing cdf’s 
given. The minimizing cdf is unique when ¢(z, y) is strictly convex in y and is 
completely characterized for some special forms of g. The maximizing cdf is a 
discrete distribution and in the above case turns out to be a three-point dis- 
tribution. Several statistical applications are discussed. 


2. Introduction. Let x; S x2 S --- S xz, ben ordered independent observations 
from a population with edf F(z) having standard deviation o. Let w, = 2, — 2 
denote the sample range. Then it is well-known that 


(2.1) E(w.) = [ x d{F"(z) + (1 — F(2))"} 


E(2,) = | xz d{F"(2x)}. 


Plackett [9] considered the problem of establishing universal upper and lower 
bounds for [E(w,)]/¢ on the lines of Chebycheff inequalities for moments. 
Moriguti [14] considered an equivalent case of establishing bounds for E(z,), 
but he assumed that the underlying distribution is symmetrical. 

Gumbel [10] uses a variational method to derive the solution of the problem of 
maximizing E(w,) and E(z,) over the class of continuous edf’s with given mean 
and variance and gives a sort of sufficiency condition. Hartley and David [1] 
consider the same problem of maximizing H(z, as in [10], and obtain the solution 
of the problem of maximizing and minimizing E(w,) but they assume, in addition, 
that F(x) is a cdf on the bounded range [— X, X]. 

Integrating (2.1) and (2.2) by parts, we find that the problems of maximizing 
(minimizing) E(w,) and E(x,) are the same as those of minimizing (maximizing) 


Received April 6, 1956; revised July 17, 1956. 
1This paper is based on the author’s doctoral dissertation, accepted by Stanford 
University. 
2 This research was done under the sponsorship of the Office of Naval Research. 
309 





JAGDISH SHARAN RUSTAGI 


» X 


| [F(x)" + (1 — F(x))") dx and [ F(x)” dz, 
—X = 


respectively, with appropriate restrictions on F(x). We see here that ¢(z, y) = 
y” + (1 — y)” org(z, y) = y’ is strictly convex in y for 0 S y S 1. 

There are many other situations in statistics where problems of maximizing 
and minimizing an integral of a strictly convex function of F(x) occur. In the 
evaluation of efficiencies of various nonparametric tests of hypotheses, we are 
faced with integrals of the above type. For example, Birnbaum and Klose [7] 
have derived a lower bound for the variance of the Mann-Whitney Statistic—an 
improvement on the lower bound due to V. Dantzig, which is based on minimizing 
fo (F(a) — a} dx, where F(z) isa cdf on 0 S x S Land fj F(z) dx = 1 — p = }. 
Here again [F(x) — 2] is strictly convex in F(z). 

The above problems suggest a generalization. 

We consider the problem of maximizing and minimizing an integral of 
g(x, F(x)), where g(x, y) is strictly convex in y. A very special case of this is the 
one where ¢(z, y) is a function of y alone and includes the important applications 
of minimizing and maximizing E(w,), E(x,), ete. Many other authors, to name 
a few such as Chernoff and Reiter [2], Rubin and Isaacson [13], Karlin [3, 4], 
Hoeffding [11], Hoeffding and Shrikhande [15], Brunk, Ewing and Utz [6], have 
also considered related problems. 

We have used Karlin’s [3, 4] technique in the solution of the minimum prob- 
lem. We compare our technique with that of [2] and [3] in Section 6 and obtain 
results similar to those in [2] and [3]. The maximum problem is discussed in 
Section 8. We find that the maximizing cdf is a discrete distribution and in our 
case a three-point distribution. David and Hartley [1] have shown further that 
the minimizing edf for E(w,) with given restrictions on mean and variance is a 
two-point distribution. This does not seem to be true in general. Similar results 
were obtained in [2], [11], and [15]. 

The results and techniques of our paper have many other applications besides 
those discussed above. Many classical inequalities of the Chebycheff type can 
be obtained with the help of our results. In Section 7 we discuss an example 
where the techniques of this paper yield the solution to a problem of a 
different type. 


3. Statement of the problem and existence of its solution. Let S = 
\(z,y): —-X S25 X,0 8 y S 1} where X is already specified. 

Let ¢ be a function defined on the closed and bounded region S such that 

(1) ¢ is bounded and continuous in S, 

(2) gis strictly convex and twice differentiable in y. 
We shall minimize (maximize) 


eo X 
(3.1) I(F) = | g(x, F(x)) dz 
x 





MINIMIZING AND MAXIMIZING 311 


over all F ¢ @ where @ is the class of all admissible edf’s, i.e., edf’s satisfying the 
following constraints: 


xX x 
/ x dF(x) = wy, / x dF(x) = wo, 
-z —X 


F(z) = 4,’ 

or er 
Here yw; and ye are such that yw. > wi and uw. < X°. In this case there exist cdf’s 
satisfying (3.2) and hence class @ is non-null. 

Integrating by parts the integrals in (3.2), the restrictions become 


[ F(x) dx = X — pw, 
J-x 


eX 
[i =P a 
x 


We shall first show that an admissible minimizing (maximizing) cdf exists. 
Let ¥ be the class of all edf’s defined on [—X, X]. Then the following is well- 
known [8}: 

Lemma 3.1. F is convex and compact in the topology of convergence in distribu- 
tion. (The compactness of Lemma 3.1 is a restatement of the Helly-Bray lemma.) 

Define a transformation 


T:5 —-R such that 


X x eX 
ToF = (/ g(x, F(x)) dz, F(x) dz, | xF (x) ar) 4 
x La J—x 


It is easy to see that JT is a continuous transformation as g(x, 7) is continuous 
in y. But a continuous transformation maps a compact set into a closed and 
bounded set [8]. Hence we have the following lemma. 

Lemma 3.2. The set T of points 


a X xX a X 
( | g(x, F(x)) dx, | F(z) dz, xF (x) az) , for F eG, 
7—X -x 


J-x 
is a closed and bounded set in R. 

The restrictions (3.3) define a cross section IT, of a closed and bounded set 
lr, and hence T; is also closed and bounded. Therefore, the minimizing and 
maximizing points exist and are given by the boundary points of T, so long as 
r; is non-null. But T, is non-null as @ is non-null as seen before. Hence the mini- 
mizing and maximizing admissible ecdf’s exist. 


4. Reduction of the minimum problem to subsidiary problems and uniqueness 
of its solution. In this section we first prove the uniqueness of the minimizing 





312 JAGDISH SHARAN RUSTAGI 


edf, using the property of strict convexity of the function ¢ in its second argu- 
ment. In characterizing the solution of the minimizing problem, we use Karlin’s 
method [3] to reduce the main problem of minimizing the integral (3.1) over the 
class @ of admissible cdf’s, to a subsidiary problem of minimizing an integral of 
a related function over all edf’s §. This reduction together with the uniqueness 
of the minimizing cdf gives us a characterization of the minimizing cdf which 
we give in the next section. 

Lemma 4.1. There is a unique cdf Fo , which minimizes (3.1) subject to the 
side conditions (3.3) when o(x, y) is strictly convex in y. 

Proor. Suppose the solution is not unique. Let Fo(x) and F(x) be two distinct 
admissible edf’s which minimize (3.1). Let 


x 
M = min | g(x, F(x)) dx. 
Fea —Z 
As ¢ is strictly convex in y, for 0 < A < 1, 
x 
[ ele, rFo(x) + (1 — »)Fy(a)) ae 
on 


x x 
r [ g(x, Fo(x) dx + (1 — X) [ ¢(z, Fi(x) dx 


1M + (1 —A)M 
= M. 


But M is the minimum, and hence we have a contradiction. 

We shall now prove the following lemmas, with the help of which we shall 
reduce the main problem to a simpler problem. 

Lemon 4.2. Fo(x) minimizes (3.1) if and only if 


*@ *@ 
41) [2D erwoP a) dz = [2 Gv) lmrunFala) de 


for all Fe @. 
Proor. For any other admissible cdf F(x), define 


I(A) = r ¢(x, AFo(x) + (1 — A)F(zx)) dz, OSA S81. 


As ¢ is twice differentiable in y, d¢/dy exists and is continuous in y, and hence 
I(X) is differentiable and is given by 


a) = [ om AFo(x) + (1 — d)F(2))(Fo(x) — F(z) de. 


Since ¢ is strictly convex in y, it follows very easily that J(A) is a strictly convex 
function of \. If Fo(x) minimizes (3.1), then J(\) achieves its minimum at A = 1; 
and this is possible if and only if 


I'(X) |nr S 0, 





MINIMIZING AND MAXIMIZING 


[ = (z, y) | yer (zy F'o(x) _ F(z)] dz < 0, 


* ay a 
[ ay (a, y) | yer (zl o( x) dx = [ a (x, y) lym (ey (x) dz. 
x oy x OY 


Conversely let (4.1) hold true. Then we have J’(A) | ,.1 S 0, and hence by the 
strict convexity of J(\) we have J(1) < J(0) or, 
eX 


x 
| g(a, Fo(x)) dx < / g(x, F(x)) dz, 
_x -x 


i.e., Fo(x) minimizes (3.1). This proves the lemma. 
We use the following notation: 


x 


I7(F) = [ x (x, Fo(x)) F(x) dz. 


With the help of the above lemma, we find that the problem P; of minimizing 
I(F) over all admissible cdf’s, is related to the problem P2,, of finding an admis- 
sible F(x) which minimizes J,,(F). In fact, we are interested in finding an Fp 
such that F» is a solution of P2»,. This looks like a complicated problem, but 
we now have a problem linear in F which is relatively easy to deal with. 

Because P; has a unique solution, Lemma 4.2 implies that there is one and 
only one Fy such that Fo solves P2»,. This, however, does not mean that Por, 
has a unique solution. 

Let T:5 — T, be a transformation given by 


ToF = ([ x (x, Fo(x)) F(x) dz, [. F(z) dz, [. 2F (x) dz) 4 


Obviously 7 is bounded and is linear and hence continuous in F. But as $ is 
convex and compact in the topology of convergence in distribution by Lemma 
3.1, the transformation 7 maps the convex and compact set into a convex and 
compact set T; , and hence we have the following result. 

Lemma 4.3. I, is a convex, closed and bounded set in three dimensions. 

Solving P2-, corresponds to finding the minimum among all points of I, for 
which 


x 
[ F(z) dz = X — mw, 


J 


aX 2 
| xF (x) dx x — me ‘ 
x 2 


and this will be a boundary point of the set I, . 





314 JAGDISH SHARAN RUSTAGI 


Suppose F’y solves P2r, . Then Fo corresponds to a boundary point of T, , and 
there is a supporting hyperplane of I, at the minimum point y = (tw, vs, Wo), 
i.e., for some mo, m , m2 and 3(m0, m, m2 not all zero), 


(4.2) Nolo + mo + n2xWo + 93 = 0, 
and 
(4.3) nou + mv + mw + 3 2 0 


for all other points (u, v, w) belonging to T, , where 


- dg s ; 
ier [ ay (x, Fo(x)) F(x) dz, 


x x 
v= [ F(x) dz, w= | xF (x) dz, 
J-x -x 


therefore, 


(4.5) no(u — Uo) + mi(v —_ Vo) +- 72\w — Wo) = 0. 
We shall see below that 7 can be taken positive and hence can be normalized 


so as to be equal to one. Therefore, by taking 7) = 1 in (4.5) we have 


. 
[ E (x, Fo(x)) + m + m c| F(x) dx 
—X Oy 


= [ E (x, Fo(z)) + m+ nz | F(x) dz. 
-x | Oy 


Hence Fo(x) minimizes 


eX 


(4.6) | > (zw. Fo(x)) + ™ + Nex F(x) dx 
i X oy 


among the class F of all cdf’s on [—X, X]. 

Conversely, if Fo(x) ¢ @ minimizes (4.6), we have, retracing the steps, that 
(u — Uo) + m(v — vo) + m(w — wo) 2 0. Suppose F(x) is admissible, and hence 
v = vo and w = wo, and hence u — uw 2 0, ie., 


[. f (x, Fo(x)) F(x) dx = © s (x, Fo(x))Fo(x) dz. 


In other words, Fy minimizes J», (F') over all admissible cdf’s @. 
We shall now show that 7 can be taken positive. Let 


rz = {(u*, v, w): u* = u, (u, v, w) e Ta}. 


Then T? is obviously convex and T, C I? . uw is the minimum of u subject 
to the conditions that v = vm and w = wy. This implies that (uw , v9 , wo) is also 
= . .* 6 ° . 

a minimum point of T; and hence is its boundary point. Hence there is an 

(no, m, 72) ~ (0, 0, 0) such that 





MINIMIZING AND MAXIMIZING 


(4.7) no(u* — uo) + m(v — vo) + m(w — we) = 
for points (u*, v, w) belonging to T? . Hence 

no(u — Uo) + m(v — vo) + m(w — wo) = 
for (u, v, w) e T. . Suppose mo = 0. Then we have 


m(V — vo) + mo(w — we) = 0, 


[ (m + mx)F(x) dx = [ (m + mx)Fo(x) dz, 
x — X 


ie., Fo minimizes fx (m + mx)F(x) dz over all F ¢F. Now m + mv is either 
nondecreasing or nonincreasing according as 7, = 0 or 7 S 0, and hence the 
unique minimizing cdf of the above integral is a two-point distribution with its 
total mass concentrated at —X and X so that uw. = X’*. But such a cdf is not 
admissible and hence there is a contradiction. 

It is easily seen now that m is not negative. Suppose 7 is negative. Consider 
then a point (wo + h, vo, wo) ¢ 2 for some h > 0, so that from (4.7) we obtain 


nh = 0, 


which is again a contradiction. Hence 7 is positive. 

Remark. Another way to show that 4. # 0 would be as follows: 4) = 0 cor- 
responds to boundary points of the set T, where the supporting hyperplanes 
are parallel to the u-axis, and hence (v , wo) corresponds to the boundary of the 
projection T, on the (v. w) plane. But the conditions on the first two moments 
are such that the given point (vo , wo) will be interior to the projection set, and 
hence m =~ 0. 

The previous argument applies only in the special case of the first two moments 
of F(x) being given. In general when more moments are specified, the latter 
argument will apply if we impose conditions on the given moments such that the 
given point is interior to the moment space which is analogous to the projection 
of the set I; . It easily follows then that 4. > 0, in general. Let 


Tpone(F) a [ [2 (z, Fo(x)) + mm + ne | F(z) dz, 


and let the problem P3r,.,,, be that of finding the minimum of J ,,.,.,(F') over all 
F(x) eS. 

The above results are summarized in the following lemma. 

Lemma 4.4. Fo solves Poy, if Fo is an admissible cdf which solves Psryq,_, , and 
any Fo which solves Pp, solves P3p.9,5, for some m and m . 


5. Characterization of the solution. In this section we characterize the solution 
of the minimum problem in terms of f,,,,(z) which is that value of y for which 


% (2, y) +m+mz=0. 
y 





JAGDISH SHARAN RUSTAGI 


A(x) = Byy».(x, Fo(x)) = x (x, Fo(x)) + m + m2. 


Since 7 y) is — in y, A(x) can have a discontinuity only if Fo(z) 

has a ll But as se, y) is increasing in y, the discontinuities of A(x) are 

upward jumps. “a since ¢ is continuous in the region 
S={(@,y):-X Sz2zsxX,0Sy8 1}, 


A(x) is bounded in S. We then have the following theorems which will char- 
acterize the solution of our problems. 

THEOREM 5.1. If Fo solves P3r.p,.. , the set {x: A(z) # 0, —X < x < X} has 
Fo-measure zero. 

Proor. Suppose that Fy) is continuous on the right. Consider the set S, = 


{x: A(x) > 0, —X < x < X}. It is a denumerable union of intervals [z; , z2). 
We shall show that 


Fo(x2) = Fo(a; —) 


and therefore, the interval [x , x2) has Fo-measure zero. Suppose this were not 
the case. Then as A(x) > 0 and is increasing in y, so that 


| A(x)Fo(a —) dx < [ A(4)Fo(x) dz, 


there is a contradiction. Consequently, S, has Fo-measure zero. 

Consider now the set S, = {x: A(x) < 0, —X < x < X}. Because all dis- 
continuities are upward jumps and A(z) is continuous on the right, S, is an 
open set. Hence S, is denumerable union of intervals (x; , x2). Then we shall see 
that Fo(r2) = Fo(z,) = 0 and that the Fo-measure of the interval (2; , x2) is zero. 
We also prove this by contradiction, as otherwise, 


/ : A(x)Fo(a2) dx < / A(x)F (x) dz. 


1 


Since S, is a denumerable union of such intervals, S, has also Fo-measure zero. 

Hence the above arguments show that the set {x: A(z) = 0, -—X <2 < X} 
has Fo-measure zero. 

REMARKS. 1. It is easy to see that if A(—X) > 0, then Fo(—X) = 0 and if 
A(X) < 0, Fo is continuous at X. 

2. The following corollary shows that the integral of A(z) is zero over intervals 
on which F is constant. 

Corouuary. If Fo(x) be such that Fo(x) = c,0 < ec < l fora S x < band 
Fo(x) < cforx < a, F(x) > c for x > b, then 


I " A(a) dz = 0 





MINIMIZING AND MAXIMIZING 317 


Proor. Suppose f2 A(z) dz < 0 and b < X. Replace F(z) on the interval 
la, b + 6) for any small number 6 > 0, by the constant quantity Fo(b + 4). 
Let v be the increase in I ¢,»,.,(/) due to this replacement. Then 

b+4 


b+s 
v= | A(x)Fo(b; + 8) dx | A(x)Fo(x) dx 


“a 


b+é b+s 
(Fo(b + 6) — | A(x) dr — / A(x)[Fo(x) — ce] dx 


b 
S [Fo(b + 5) — e] ([ A(x) dx + 2M), where | A(z)| < M. 


Letting 5 —> 0, we find that » becomes negative and hence there is a contradic- 
tion. The case where b = X is trivial. 

If we suppose that f° A(x) dz > 0 and a is a point of continuity of Fy, then 
by an argument similar to that above, we get the contradiction when a > —X, 
by replacing Fy on (a — 6, b) by Fo(a — 54) and letting 6 — 0. In casea = —X 
or there is a jump in F, at a, the proof is trivial. 

Remark. If K(x) is a function satisfying the properties of the function A(z), 
then the problem Px (corresponding to P;»,»,,,) of finding a edf Fy such that 


x 

Ix(Fo) = min I¢(F) = min | K(z)F(2) dz, 
FreS FeS 4-X 

gives the same results as stated in Theorem 5.1 and its corollary, i.e., if Fy is 

the solution of Px, 

(a) the set {z: K(x) # 0, —X < x < X} is of Fo-measure zero. 

(b) { K(x)Fo(x) dz is zero over intervals where Fy is constant. 

TueroreM 5.2. If Fo solves Psr._:9, , then Fo has no jumps on the open interval 
(—X, X) and hence A(x) is continuous on (—X, X). 

Proor. Let Fo have a jump at x, —X < xa < X. Then by Theorem 5.1, 
A(z) = 0. But since dg/dy is strictly increasing in y, Z is the right-hand end- 
point of an interval on which A(z) < 0. By the same arguments as in the proof 
of the Theorem 5.1, we see that on this interval Fo(z) = Fo(ze), and hence Fp 
has no discontinuity. But because discontinuities of A(x) arise on account of 
jumps of F, , there are no discontinuities in A(x), or A(x) is continuous on the 
interval (—X, X). 

Let fy,9.(z) be defined with 0 S f,,,,(z) S 1 such that B,,,,(z, fo,».(z)) = 0. 
(The function f,,,, is defined on that subset of [—X, X] for which there exists a 
y between 0 and 1 such that B,,,,(z, y) = 0.) 

As d¢(z, y)/dy is continuous and strictly increasing in y, f,,,.,(z) is continuous 
wherever it is defined. If 0 < f,,.,(%0) < 1, then f,,,, is defined in some interval 
about xo (the interval is one-sided if zr = +X). Graphically f,,,, represents a 
number of curve segments which terminate when f,,,,(z) is zero or one. 

More specifically, f,,.,(2) is defined on the union of closed intervals at the 
end-points of such of which it is either zero or one. Let [a, , b;] and [a; , b,] be 





318 JAGDISH SHARAN RUSTAGI 


fuia,(@;). If there are an infinite number of intervals [a; , b;] in the neighborhood 
of b;, it follows that 


two such intervals, not separated by any others such that b; S a; , then f,,,,(b;) = 


Farne(0i) = Sorna(@s) = Sarna (05) 
for b; sufficiently close to b;. Hence the following definition of a function g,,,, 
has a meaning. 
DeriniTion. Define g,,,, to be that unique function on [—X, X) which is 
continuous on [—X, X) such that 


! farng(Z), where f,,», is defined, 
‘ 
\0 or 1, elsewhere, 


Yning\ x) - 


and 
Yaine(X) soe A, 


provided that the subset of [—X, X) for which f,,,, is defined, is non-null. 

THEorEM 5.3. If Fo solves P3ryq,9, , then for —X S x < X, Fo coincides with 
Inn. except on intervals on which Fo is constant. 

Proor. From Theorems 5.1 and 5.2, we know that Fy has no jumps on (—X, X) 
and Fy cannot increase when A(z) + 0. Therefore, Fo remains constant until it 
intersects with f,,,, . 

CorouuaRry. If g,,,, is a@ cdf, then Fo(x) = Qyy9,(2). 

REMARKS. 1. We can represent a conceivable situation by Fig. 1. 

2. It must be noted that the corollary to Theorem 5.1 puts a strong restriction 
on the intervals on which F» is constant. 

3. The solution in the general case may not be completely specified, but we 
shall consider in the following some special cases where the minimizing cdf 
is completely characterized. 

Special Cases. 

I. When d¢(z, y)/dy is nonincreasing in z. 

Tueorem 5.4. If dg(x, y)/dy is nonincreasing in x and m < 0, then F(x) = 
Gnsn,(%) for —-X <2< X. 


— delotes P@), 





MINIMIZING AND MAXIMIZING 319 


Proor. If the conditions of the theorem hold, A(z) is a decreasing function of 
x and hence f,,,, is increasing in x so that g,,,, is a cdf. We get the result of the 
theorem, then, by the corollary to Theorem 5.3. 

II. When ¢ is a function of y alone, i.e., o(z, y) = ¥(y). 

Lemma 5.1. If g(x, y) = Wy), then corresponding to Fy , which is the solution of 
P, , n2 is negative. 

Proor. If m. = 0, m + mex is nondecreasing. Also as ~’(y) is nondecreasing in 
x, the function A(z) = y/(Fo) + m + mx is nondecreasing in x. Therefore, 
fayn,(@) iS nonincreasing, and hence from Theorem 5.3, it follows that Fo is con- 
stant on [—X, X). But for such Fp , uw. = X’, and hence Fp is not admissible. 
Therefore, 72 < 0. 

Theorem 5.4 and Lemma 5.1 imply, then, the following theorem. 

TuHeoreM 5.5. If o(r, y) = Wy), the solution Fo of P, is given by gy,., for some 
Mm, 2. 

Remark. Unfortunately it is not always true that 7, < 0 as assumed in 
Theorem 5.4. In fact for side conditions corresponding to small variance, one 
has m > 0. It might still happen that f,,,, is nondecreasing, and then the result 
of Theorem 5.4 still holds. In any case Theorem 5.3 with the corollary to Theorem 
5.1 gives a useful characterization of the solution of our problem. 


6. Comparison of our Technique with that of Gumbel [10] and Chernoff and 
Reiter [2]. 
(i) Gumbel’s Method. The problem considered by Gumbel is that of maximizing 


el 


(6.1) | 2(F)nF"~ dF, 


“0 


with restrictions 
al 


1 
| «(F)aF =0, | a(F) dF =1. 


0 /0 


A variational technique has been used to derive the form obtained for the 
maximizing cdf is given by equating to zero, the first variation of 


(6.2) / ‘na(F)F"' + ma(F) + m2 (F)} dF, 


1.€., 
(6.3) nF" + m + 2mx(F) = 0, O<F <1. 


The above equation gives a sort of sufficiency condition as any admissible F 
given by (6.3) does maximize the integral (6.2) and hence maximizes (6.1). 
David and Hartley [1] have given an ingenious argument to prove the sufficiency 
of the solution, but that seems unnecessary. However, the above equation does 
not give the necessity of the solution, since this approach does not provide an 
argument for proving that the constants m and m2 , to make the cdf admissible, 
always exist. 





320 JAGDISH SHARAN RUSTAGI 


This method also extends to the case of a bounded random variable as treated 
in this paper. 

We shall use the above approach for our prob!em. Integrating by parts, we 
have 


X 


“x x 
| g(x, F(x)) dx = xre(x, F(x)) | [ x oe (x, F) dz. 
J—xX —X — dx 
dg(x, F(x)) = + (a, F(x)) dx + = (x, F(x)) dF. 
Ox dy 


When ¢(z, F(x)) is a function of F alone, say, ¥(F(x)), dy = ¥/(F) dF, and hence 
the problem of minimizing 


aX 
| WF) dz 
—T 
is the same as that of maximizing 
1 
[ 2(P)y(P) aP. 
0 
Hence using Gumbel’s approach, we maximize 
1 
| (eCPW’(F) + ma(F) + m2*(F)) dF 


and get the following equation satisfied by the admissible maximizing cdf, 
W(F) + m + 2mz(F) = 0. 


In the above case, our technique would also lead to a similar equation. But 
in the general case when we consider g(x, F(x)), Gumbel’s approach does not 
seem to apply. 


(ii) Chernoff and Reiter Method. Chernoff and Reiter [2] consider the problem 
of minimizing and maximizing 
| g(x) dF (zx), 
with side conditions 
| xz dF(x) = q, | z dF(x) = 
such that c. > c} and g(x) is a continuous function of z. 


In the process of reduction of our main problem, we have an intermediary 
problem P2,, of finding the minimum of 


x 
Ip,(F) - | & (c, Fo(x)) F(x) dz 
Lx dy 





MINIMIZING AND MAXIMIZING 


over all admissible cdf’s @. Now as 


* a 
fle) = [ % («, Fula) ae 
x OY 
is continuous in 2, integrating by parts I,,(F), we have 
x 
I,,(F) = constant — [ f(x) dF(z), 
x 
or we maximize 
» 
[ $@ ar) 
x 


over all admissible edf’s @. Now as f(z) is continuous, by the methods of Chernoff 
and Reiter, the necessary condition for the maximum is given by the following. 
(a) There is an m and 7 such that when z is a point of continuity of F(z), 


; 0 : 
Bayas(2, Fo(x)) = - (x, Fo(x)) + m + 2mz = 0, 
except on a set of Fo-measure zero 
(b) Fo has no jump in —X < z < X, otherwise either B,,,,(z, Fo(x)) > 0 or 
Byn.(z, Fo(x—)) < 0. Hence we get a result similar to our result obtained in 
Section 5, i.e., the set 


{x:Bay.(z, Fo(z)) #0, —X¥ <2 < X} 
has F'o-measure zero. 


7. Examples. In this section we discuss some examples to illustrate the method 
of obtaining the minimizing cdf for our problem for some specified function ¢. 
We also discuss an example of the special case y(z, y) = ¥(y). We have included 
an example where the methods of the paper have been used to solve a problem 
of a different type. 

EXAMPLE 1. Consider the problem of finding 

min : 
req 2 


[ ” IF) — al? az, 


when @ denotes the admissible class of cdf’s as in Section 3. 

This is the special case of a more general problem where we minimize 
f2x¥(F(x) — x) dx for F ¢ @. Here y(y — xz) = ¥(y — =z)’, » being a strictly 
convex, bounded, and continuous function of its argument. This problem has 
also been discussed by Birnbaum and Klose [7], as a lemma to derive a lower 
bound for the variance of the Mann-Whiteny Statistic. 

If g(x, y) = v(y — =), ¢ is strictly convex in y and it is easy to verify that 
(a°e)/(axay) < 0. We know that the solution exists and is unique, and the 
problem is reduced to that of finding an m and 7 so that Fy solves P3r,.,,, where 
P3799, 18 the problem of finding F ¢ ¥ which minimizes 





JAGDISH SHARAN RUSTAGI 


eX 


| ly’ (Fo(x) —-Z+Tra tT no t\F (x) dx. 
—x 


Then by the theorems of Section 5, we know that Fo(z) is given in terms of 
Yn,n2(X), Where g»,_,(2) is uniquely expressed in terms of the function 


“fe 
Inx(z) = zc +W (—m — mz). 
Returning to our example, we have the function 
Sarng(2) (1 — mx — m, 


; . ' l 

Seuui2) =- 9) fof 2 = ahs . JIuna(t) = 1 for 2 = = 2 
1 — 7 1 — m 

Case 1. m2 < 1. Then f,,,, is increasing. Define g,,,,(a2) by the following. 


(0, x < max (—X, 2), 
\ x = min (x7, X), 


Jning(X) ? 
\(1 — m)z — m, elsewhere. 


AS Gj, 18 @ cdf, Fo(x) = gyyy,(2). 

Case 2. nm. 2 1, fy,,(a) is nonincreasing, and hence the solution is either a 
one-point distribution or a two-point distribution with total probability concen- 
trated at —X and X. In both cases, then, the solution is not admissible. 

EXAMPLE 2. Consider the same problem as in Example i, but with an addi- 
tional restriction on the cdf F(z), i.e., F(x) 2 x. Now let F(x) be a cdf on (0, 1]. 

Under this additional restriction, the class @* of admissible cdf’s is also com- 
pact and convex. Then the solution to this problem exists. It is unique since 
g(x, y) = 4(y — 2)’ is strictly convex in y. 

Applying the methods used to prove Lemma 4.2, we see that the problem is 
the same as that of minimizing 


3 
[Fo(x) x\F(a) da 


“0 


over all F ¢ @*. It is easy to see that the set analogous to r, of Lemma 4.3 here 
is also convex, closed and bounded, and hence, applying the method of Lemma 
4.4, we reduce the problem to that of finding the edf’s corresponding to 

1 
min | [F'o(x) r+ m+ malF(x) dx 


Fey 


= 

min | [Fo(x) +n t+ ns x\F (2x) dx, 13 _— 

eG ~0 
where § is the class of all cdf’s F on [0, 1] such that F(z) 2 x. We can now 
apply the methods of Section 5. Define the function f,,,, with z S f,,,,(z) < 1 
such that 





MINIMIZING AND MAXIMIZING 


fun(t) = —m — mt. 
Define the function g,,,, such that 


Ferre(2), where f,,», is defined, 


Guins(X) = 
z or 1, elsewhere on [0, 1], 


9n23(2) is continuous on [0, 1], and g,,.,(1) = 1. 


Then g,,», gives the solution Fo of the problem if g,,,,(z) is a cdf. We shall give an 
explicit characterization of g,,,, in the various possible cases. Let the point of 
intersection of y = — — mx and y = x be denoted by z* = —[m/(1 + n;)]. 
Let x** be such that —7, — 7,2** = 1. 
Case I. m3 > —1. 
(a) z* 30 
(0, 
Jains (2) oa if 
1, 


This g,,,, is a cdf, but it is not admissible. 
(b) z* > 1 
(0, x <0, 
Jains (2) — {—m — at, 0 <{ 2 <= “. 
\l, z> 2. 
If z** > 0, gan, is a cdf and hence Fo(x) = gy,,9,(x). 
If x** < 0, the solution is a one-point distribution with mass concentrated at 
x = 0, which is not admissible. 
(c) 0 < x* < 1. Consider (i) —1 < 3 <0 


(0, 
J—m — at, 
z, 
i 
This is again a cdf, and hence Fo(x) = g,,,,(x). 
(ii) » > 0 


Jains (x) = 


0, 
—m — Nar, 


Goins (2) = 
’ 


1, = 
This g,,,, is not a edf. Here A(z) = Fo(x) + m + asx. At x = 0, A(z) > Oif 
F(x) > —m and A(z) < Oif Fo(x) < —m . Suppose there is a jump at z = 0 
such that Fo(0) > —m , then A(O) > 0, and Fo(z) is then continuous at z = 0. 
Hence there is a contradiction. Let there be a jump at z = 0 such that c = 
F,(0) < —m,A(0) < 0 and hence we take as a possible minimizing cdf 





JAGDISH SHARAN RUSTAGI 


0, 


. 

' C, z<¢, 

= x z<i1 
\l, z> 1. 


The remark after Theorem 5.1 puts the following restriction on c, 


[ (c+ m+ mz) dx = 0, 
“0 


(c+ mc + =e = (), or > = —[m/(m/2 + 1)), 


since c = 0 gives an inadmissible cdf. Incidentally this shows also, as is evident 
from the value of c itself that 


z*<c<.—m. 
The unique value of c which is obtained from the constraints exists if and only if 
(7.1) (1 — 2u:)* = (1 — 3y,)*. 
This condition is obtained by eliminating c between the equations 


ts : ap 244 
[, F() de m1 mand +S 


1 ce +2 


‘ 9 
2] 2F(x) de =1—-m=¢ +5(1-e) = 95 


~0 
Hence G(z) is admissible and Fo(x) = G(x) if and only if yu; and ye are such that 
(7.1) is satisfied. 
Case II. n3 < —1. 
(a) z* < 0. We then have 


(0, z <0, 
Inw(Z) = }—m — mt, [iA0e<e”, 
a x a**. 


If 2** > 0, go, is a cdf and hence Fo(xz) = gp,_,(x). If z** < 0, the minimizing 
edf is a one-point distribution and is not admissible. 

(b) z* > 1. The minimizing cdf is the same as in Case I (a) and is not 
admissible. 

(c) 0 < «* < 1. Then we have 


0, 
x 
Yun(t) = 9 oe — nsx 
lx 





MINIMIZING AND MAXIMIZING 


Jnyn, 18 @ Cdf, and hence Fo(x) = gyin,(2). 


EXAMPLE 3. Let 7 S 2% S ::- 


s <= z, be n ordered, independent observations 
from a cdf F(x). Consider the problem of maximizing E(z,) with restrictions 
(3.2). The same problem for cdf’s defined over the whole real line with restrictions 


on mean and variance has been discussed by Gumbel [10] and David and Hartley 
1}. 


x 
E(2n) - | xz d{F(x)}". 


x 
Integrating by parts, the above problem reduces to that of finding 
x 
min [ F"(x) dz. 
Fea x 
Now as ¢(z, y) = y” is strictly convex in y and is a function of y alone, the 


solution F(x) is given by the function g,,,,(z), where 


. 


Here 7; and 7 are determined by the following four cases: 


max(z,, —X) min(z:, X) 
Case 1. x Xe 
Case 2. —X Xe 
Case 3. —X x 
Case 4. qy - 


We give below the equations determining 7: , 72 in the above cases. 


Case l. yp mits + m), 


Ne 


1 2 9 n* ) 
n (x + 2m + 1 . 


De 


E = . (—m + nm X), 


n(n — 1) 


) n/(n—1) 
—— i +(n— 1)m - (n —= 1)é 


«(n+-259)] 





JAGDISH SHARAN RUSTAGI 
n-—1 
2 


: —_ In— n/n—1) - } 
xt |” ie 9" + 


N2 n 2n — 1 


x + 


nin nin l r 
a er > f=-(—m — »X), 


= 1.) en 


—1 
amen: § 


we = X* — a/& - 1)m pian + = — i) em 1) mt. 
N2 n 2n — 1 

Fig. 2 represents 7 and 72 in terms of yu; and ye for the case X = 

Similar results can be easily obtained for maximizing the expectations of the 


range and the smallest observation with similar restrictions on the underlying 
edf’s. 


8. Characterization of the solution of the maximum problem and some ex- 
amples. In this section we find the solution to the problem of maximizing 


ex 


I(F) = | (x, F(x)) dx 


over all admissible cdf’s. The existence of the maximizing cdf has already been 
established by Lemma 3.2. We shall show now that the solution is a discrete 
distribution and in our case, is at most a three-point distribution, i.e., a distribu 


7, ond n, in terms of y, and Be 
Fie. 2 





MINIMIZING AND MAXIMIZING 327 


tion concentrating all its mass at just three or fewer points. Some illustrations 
have been given at the end of this section. 

THEOREM 8.1. The solution to the problem of maximizing I(F) over the class @ 
of admissible cdf’s, is at most a three-point distribution. 

Proor. The inequality (4.0) shows that a convex combination of two maxi- 
mizing admissible edf’s which is itself also admissible, gives a value which is 
smaller than the maximum. Hence the maximum of /(F) occurs for edf’s which 
correspond to the extreme points of the convex set @. 

By Theorems 21.1 and 21.3 of Karlin and Shapley [5], it is then easy to see 
that the maximizing cdf is at most a three-point distribution. 

ReMARK. It is important to note here that in some cases, the maximizing 
admissible cdf can be further reduced to a two-point distribution [1], {3}. 

We shall illustrate the above results by a few examples. 

EXAMPLE 1. Suppose we want to minimize E(z,) given in Example 4 of the 
last section, over all admissible edf’s. The problem is the same as that of maxi- 


mizing 
se 
[ F"(x) dx 
—X 


over all admissible cdf’s. As g(z, y) = y”, is strictly convex in y, the maximizing 
admissible cdf of the above integral is at most a three-point distribution. 
Similarly the minimizing admissible cdf of E(w,) is at most a three-point dis- 
tribution. David and Hartley [1] claim that it can be further reduced to a two- 
point distribution. 
EXAMPLE 2. Suppose we are interested in finding the maximizing cdf of 


1 x 
5 | (F(e) — alae 
= “xX 


such that F(z) satisfies side conditions (3.3). 
Now ¢(z, y) = }#(y — 2)’ is strictly convex function of y. Hence the solution 
of the above problem is at most a three-point distribution. 


9. Acknowledgment. The author is deeply grateful to Professor Herman 
Chernoff who suggested the problem and gave generous help and guidance 
throughout the entire work. 


REFERENCES. 


{1} H. O. Hartiey anv H. A. Davin, “Universal bounds for mean range and extreme 
observations,’’ Ann. Math. Stat., Vol. 25 (March, 1954), pp. 85-89. 

{2} H. Cuernorr anp 8S. Rerrer, ‘Selection of a Distribution Function to Minimize an 
Expectation Subject to Side Conditions,’’ Technical Report No. 23 (March 12, 
1954), Applied Math. and Statistics Laboratory, Stanford University. 

[3] S. Karurn, ‘‘Notes on Theory of Games,’’ California Institute of Technology, mimeo 
(1954). 

[4] S. Karun, “Notes on Theory of Games and Decision Functions,” Stanford University, 
mimeo. (1955). 





JAGDISH SHARAN RUSTAGI 


S. Karun anv L. 8. Suapuey, ‘‘Geometry of Moment Spaces,’’ Memoirs of American 
Mathematical Society, No. 12. 

H. D. Brung, G. M. Ew1nG ann W. R. Utz, ‘‘Minimizing Integrals in Certain Classes 
of Monotone Functions with Applications,’’ University of Missouri, Depart- 
ment of Mathematics, September, 1954. 

Z. W. BrrnBAUM AND O. M. Kiosz, ‘‘Bounds for the variance of the U-statistic in terms 
of probability Y < X,” Abstract, Bull. Amer. Math. Soc., 1955. 

M. Love, “Probability Theory,” D. Van Nostrand, New York, 1955. 

R. L. Puackertt, ‘‘Limits of the ratio of mean range to standard deviation,’’ Biometrika, 
Vol. 34 (1947), pp. 120-122. 

E. J. Gumset, ‘‘The maxima of the mean largest value and of the range,” Ann. Math. 
Stat., Vol. 25 (1954), pp. 76-84. 

W. Hoerroine, “The extrema of the expected value of a function of independent 
random variables,’’ Ann. Math. Siat., Vol. 26 (1955), pp. 268-275. 

S. H. Kuamis, ‘‘On the reduced moment problem,’’ Ann. Math. Stat., Vol. 25 (1954), 
pp. 113-122. 

H. Rustn anp §. Isaacson, ‘On Minimizing an Expectation Subject to Certain Side 
Conditions,’’ Technical Report No. 25 (July, 1954). 

8S. Moricutt, ‘‘Extremal properties of extreme value distributions,’”’ Ann. Math. Siat., 
Vol. 22 (1951), pp. 523-536. 

W. HoerrpineG anv 8. 8. SHRIKHANDE, ‘“‘Bounds for the distribution function of a sum 
of independent identically distributed random variables,’’ Ann. Math. Stat., 
Vol. 26 (1955), pp. 439-449. 





ON CONSISTENT ESTIMATES OF THE SPECTRUM OF A STATIONARY 
TIME SERIES':? 


By EMANUEL PARZEN?® 


Columbia University, Hudson Laboratories, Dobbs Ferry, N.Y. 


Summary. This paper is concerned with the spectral analysis of wide sense 
stationary time series which possess a spectral density function and whose 
fourth moment functions satisfy an integrability condition (which includes 
Gaussian processes). Consistent estimates are obtained for the spectral density 
function as well as for the spectral distribution function and a general class of 
spectral averages. Optimum consistent estimates are chosen on the basis of 
criteria involving the notions of order of consistency and asymptotic variance. 
The problem of interpolating the estimated spectral density, so that only a 
finite number of quantities need be computed to determine the entire graph, is 
also discussed. Both continuous and discrete time series are treated. 


1. Introduction. A stochastic process is a family of random variables z(t), 
where ¢ varies in some set 7’. If the set T is the infinite real line, then x(t) is 
called a random function, and if T = {0, +1, +2, --- }, then x(t) is called a 
random sequence. If the parameter ¢ is interpreted as denoting time, then the 
stochastic process is called a time series, with the adjectives continuous or dis- 
crete being used to indicate whether it is a random function or a random sequence. 

Let us suppose that we have observed a sample of length T of a (continuous 
or discrete) time series x(t). The general problem of time series analysis is to 
infer the statistical characteristics of x(t) from the observed sample. Now in 
order to perform a statistical analysis of x(t), one has to assume a model for 
x(t) which is completely specified except for the values of certain parameters 
which one proceeds to estimate on the basis of the observed sample. 

A widely adopted model for x(t) (see Grenander and Rosenblatt [4], [5]) is 
the following. It is assumed that z(t) may be written as a sum of a mean value 
function m(t) and a fluctuation function y(t): 


(1.1) a(t) = m(t) + y(?). 


The domain 7 of the variable ¢ is to be taken as the infinite real line, -—o < 
it < o, in the continuous case, and as the set of integers 0, +1, +2, --- in the 
discrete case. We seek to treat simultaneously both discrete and continuous time 


Received September 13, 1955; revised October 15, 1956. 

1 This work was supported by the Office of Naval Research under Contract N6-ONR- 
27135. Reproduction in whole or in part is permitted for any purpose of the United States 
Government. 

? Hudson Laboratories Contribution No. 15. A more detailed version of this paper may 
be found in Hudson Laboratories Technical Report No. 33. 

3 Now at Stanford University 


329 





330 EMANUEL PARZEN 


series. Most equations will hold for both cases, with the proper interpretation, 
which will be explained as we proceed. 

It is assumed that the function m(t) is nonrandom, and that there is a fixed 
number K of known functions ¢;(t), --- , ¢x(t) such that m(t) may be written 
as a linear combination of the ¢;(t): 


(1.2) m(t) mygi(t) + --- + mex(t). 


The constants m; (for 7 = 1, ---, K) are unknown, and are to be estimated 
from the sample. 

The fluctuation function y(t) is a stochastic process, whose mean value func- 
tion Ey(t) vanishes identically in ¢t. It is assumed that it possesses a finite second 
moment £ | y(t) |*, and that it is wide sense stationary, which means that the 
product moment Ey(t) y(t + v) is independent of t, and depends only on v. One 
then defines the covariance function 


(1.3) R(v) = Ey(t) y(t + v). 


In the case of random functions, it is assumed that R(v) is continuous. Then, 


?(v) possesses a representation as a Fourier-Stieltjes integral: 


. 


(1.4) ¢(v) = | ef" dF(w), 


where F'(w) is a bounded non-decreasing function, called the spectral distribution 
function of the process. The domain of the variable v is the same as that of 1, 
and the domain of the variable w is — ~ to © in the continuous case, and —7 
to mw in the discrete case. The domain of integration of an integral involving w 
is to be taken as the whole domain of w, in cases where it is not otherwise specified. 

It is assumed next that R(v) is summable. It then follows that the spectral 
distribution function /(w) possesses a continuous density function f(w), called 
the spectral density function of the time series z(t). The following relations hold: 


») R(v) = | e’*f(w) dw, 


(1.6¢) f(w) = a e ”’R(v) dv 


27 J 


ie ] ate 

(1.6d) se > &""R(v). 

In cases where the limits of integration (or summation) of an integral (or sum) 
involving the variables u or v are omitted, they are to be assumed to be — © to 
+. Henceforth, we write equations of the type of (1.6) only once, for the con- 
tinuous case, with the understanding that for every such equation a correspond- 
ing equation may be written for the discrete case by replacing the integral by 
a sum. For certain important equations, we will write, without further explana- 
tion, two equations, with a suffix d for the discrete case and a suffix c for the 
continuous case. 





STATIONARY TIME SERIES 331 


The model for the process x(t) which has just been described assumes only a 
knowledge of the first and second moments of the process, and assumes no know!l- 
edge of the probability distribution. The moments are assumed to be completely 
specified by the constants m , --- , mx, and the covariance function R(v), or 
equivalently the spectral density function f(w). By analysis of an observed time 
series is meant the estimation of the value of these quantities on the basis of 
observed samples. The estimation of the constants m , --- , mx is called regres- 
sion analysis, and the estimation of the spectral functions is called spectral 
analysis. 

A basic requirement for an estimate is that it be consistent in quadratic mean. 
Let m be an unknown parameter of a time series z(t), and let x(t) forO St S T 
(ort = 1, --- , T in the discrete case) be an observed sample of the time series. 
An estimate mr of m, formed on the basis of the sample, is said to be consistent 
in quadratic mean if the mean square error E | mr — m |’ tends to zero, as T’ —> 
co, where the expected value is taken under the assumption that m is the true 
parameter vaiue. If an estimate is consistent, it is then asymptotically unbiased, 
which means that Em; — mas T — ~. 

However, we shall be interested in estimates which are consistent and asymp- 
totically unbiased at certain prescribed rates. Let a be a positive number. We 
define an estimate to be asymptotical'y unbiased of the order of 7 if, for some 
finite constant 8, 

(1.7) lim T“(Em; — m) = 8B. 

T+2 
We say that an estimate possesses an asymptotic variance o of the order of 
T°" if o is positive and 
(1.8) lim T'* o{mr] = o, 

T+2 

where o [mz] = E| mr — Emr|\° is the variance of m;. The importance of 
these notions derives from the central limit theorem, for dependent random 
variables, from which one may hope to obtain conditions that the normalized 
random variable (mz — Em,)/o|mz7] tends to a normal distribution. We define 
an estimate to be consistent of the order of T°*, with asymptotic bias 8 and 
asymptotic variance o, if (1.7) and (1.8) hold. If such an estimate obeys the 
Central Limit Theorem, then the random variable 7* (m; — m) tends to a 
normal distribution with mean 8 and variance o. Many estimates that one 
encounters are consistent of the order of 7; however, we will encounter below 
estimates which are consistent of a lower order. 

A knowledge of the order of consistency, the asymptotic bias, and the asymp- 
totic variance of an estimate is valuable on several counts, as will be shown in 
detail in a later paper [8]. 

The problem of regression analysis has been extensively treated by Grenander 
and Rosenblatt in several excellent papers (see [5]), in which they obtained 
expressions for the asymptotic variances (of order 7') of various estimates of the 
constants m; in the model given above, and obtained conditions that the least 





332 EMANUEL PARZEN 


squares estimate and the best linear unbiased estimates have the same asymp- 
totic variance. We mention regression analysis here only to point out that the 
results of this paper remain valid if in estimating the spectrum one uses the 
deviations of the observed values of x(t) from the sample mean value function 
formed by inserting into (1.2) the least squares estimates of the constants m; . 
As far as detailed considerations are concerned, we consider only the case where 
m(t) = m, an unknown constant. 

The problem of outstanding interest at the present time in the analysis of 
time series is that of estimating the spectral density function, and it is this 
problem that is treated in this paper. In view of Eq. (1.6), the obvious way to 
estimate f(w) is to form the Fourier transform f7(w) of the least squares estimate 
R,(v) of the covariance. The sample spectral density function fr(w) so obtained 
is essentially what has been studied by various authors under the name of the 
periodogram. However, as is well known, it turns out that f7(w) is not a consistent 
estimate of f(w). 

Rather, to begin with, we are only able to estimate what may be called spectral 
averages; that is, averages of the spectral density function of the form 


(1.9) J(A) = | A(w)f(w) dw, 

where A (w) is a suitably chosen function. On the one hand A(w) may be chosen 
to be a unit step function, A(w) = 1 or 0 according as w < wo or w = wo. Then 
J(A) represents the spectral distribution function f(wo). On the other hand, 
A(w) may be a function highly peaked about a center frequency w . 

In Section 5, we obtain a class of consistent estimates of the spectral density 
function at a point wo. However, the order of consistency of these estimates wil] 
be T°*, where 0 < a < 4. Expressions are obtained for the asymptotic variance 
and bias of such estimates, so that the means are at hand for choosing among the 
large class of estimates presented. In Section 6, consistent estimates of the spec- 
tral density function, asymptotically optimum within the family of estimates 
considered, are discussed. In Section 7, we use the ideas leading to consistent 
estimates of the spectral density to obtain alternative estimates of the spectral 
averages. In Section 8, we treat the problem of interpolating the spectral density. 


2. Assumptions on the fourth moments. Some additional assumptions are 
required in addition to the assumptions we have already stated. We assume that 
the fluctuation function y(t) is wide sense stationary of order 4, in the sense that 
E | y(t) | * exists for all t, and the fourth moment function 


(2.1) P(v, , ve , vs) = Ey(t) y(t + v1) y(t + v2) y(t + 0s) 


is a function only of the time differences 1; , v2 , v3 , and not of the initial time ¢. 
Now if the process y(t) were normally distributed, then P(v, , v2 , v;) could be 
expressed in terms of the covariance function R(v) as follows: 


(2.2) Pa(y > V2, U3) = R(v,)R(ve ~ V3) = R(v2)R(v3 = 0) + R(v3)R(v; “= Ue). 





STATIONAEY TIME SERIES 


We introduce the function 
(2.3) Ov; » V2, v3) = P(y, » V2, U2) _ Po(r » U2, Vs), 


which is the difference between the actual fourth moment function of y(é), and 
what it would be if y(t) were Gaussian. We refer to Q(v; , v2, v3) as the non- 
Gaussian part of the fourth moment function of y(t); it is the same as the fourth 
cumulant function. 

We assume that Q(v; , ve, v3) is absolutely summable (and, in the continuous 
parameter case, continuous) over all of (v; , v2, v3) space. 

We will find in many instances that Q(v; , v2 , vs) admits of a representation as 
a Fourier integral: 


(2.4) Q(v,, v2, v3) = Il exp [i(w, 0; + wel, + wers)|g(wi, We, Ws) dwidwedws, 


where the function g(w; , we , ws) is absolutely integrable over all of (w; , wa , Wa) 
space. We may have also the relation 


(2.5) [ aan, U,uU+ v2) = an | dw, dw. g(wi, —We, Wz) exp [t(wiri + wees). 


We will assume these relations to be valid, since they simplify the writing of 
some of the results. It should be pointed out that the notion of the Fourier trans- 
form of the non-Gaussian part of the fourth moment function has previously 
been considered by Magness [6] where some examples may be found. 

In the continuous parameter case we assume also that the stochastic process 
x(t, w), where w varies in a space 2 on which the basic probability measure P 
is defined, is measurable jointly in ¢t and w. Then the random integrals, such as 
Jc x(t) dt, which are employed exist with probability one, by virtue of the Fubini 
theorem (see Doob [9]). Alternatively, the random integrals employed may be 
defined as limits in quadratic mean (see Loéve [10}). 


3. The sample covariance and spectral density functions. The estimates of 
the spectrum that we shall consider will be defined in terms of two functions, 
the sample covariance function and the sample spectral density function, which 
are defined in this section. Given a sample of observed values of z(t) for0 S ¢ S$ T 
(or fort = 1, --- , 7), let my be the least squares estimate of m, and consider 


the function Y,(t), defined by 

aay Y,(t) = x(t) — mr forO0 Sis T, 
(3. 

3.1) = 0 otherwise. 


Define now the function 


(3.2¢) fr(w) 7 | [ Y+(ie*™ di 


2 


> Y2(t)e*™ ? 


(3.2d) T | - 





334 EMANUEL PARZEN 


which may be regarded as the notion of the “‘periodogram” extended to the case 
of time series with an unknown mean value. We call fr(w) the sample spectral 
density function, because its Fourier integral 


(3.3) R-(v) = [ few) aw 


is a consistent estimate of the covariance function. We cali R,(v) the sample 
covariance function. It vanishes for |v| = 7’, and for|v| < T, 


T—|v| 
(3.4¢c) R,(v) = zl Yr(t)¥r(t + | v | ) dt 
1 T—\|v| 
(3.4d) =p L Yo(¥et + | 01). 
t=1 
We may invert (3.3) to obtain 
1 T 
Qn ir 
1 


2r leist 


(3.5e) fr(w) = e ’’Rr(v) dv 


(3.5d) e ’Rr(v). 

In the continuous parameter case, the interval of integration in (3.3) is infinite, 
and to establish that fr(w) is summable, one needs to employ a standard argu- 
ment involving Plancherel’s theorem. 


An important role in the sequel will be played by the following representation 
of Rr(v), for|v| s T: 


(3.6) Rr(v) = Drv) + br(v) + R(r) € i Le), 


where 


T—|»v| 
(3.7) Deo) = 5 | dt{y()y(t + |v |) — RO)} 
0 


and b;(v) is defined so as to make Eq. (3.6) correct. 

The term b7(v) represents the bias arising from the fact that the sample covari- 
ances are computed using Y,(t), the deviations of the observations from the 
sample mean. That it may be essentially ignored in our calculations will follow 
from the fact that there is a constant K such that 


(3.8) T’ E|br(v)  <s K 


for any v and T. To establish (3.8), it suffices to show that there is a constant K’ 
such that, for any choice of numbers 7 and T;, T:, 7;, 7 satisfying 0 
92% 2% T;, T, < i 


To Ts \2 
(3.9) E | [ y(ti)y(te) dt, dt, | > K'T’, 
TT; Ts | 





STATIONARY TIME SERIES 


which follows from the fact that the expected value in (3.9) is less than 


afr / | R(u) | au\, +T If! | Q(u1, te, Us) | dus dus dus. 


\ 


We next evaluate the covariance of D;(v). We obtain that, for any non-nega- 
tive numbers 2; and rz , 


7 
10) TED,(v,)Dr(v%) = / duU r(u, 01, ¥2){Q(v1, u, u + v2) 
(3.10 -T 


+ R(u)R(u + vy — m%) + R(u + o)R(u — v)}, 
where U,(u, v; , v2) is a function with values between 0 and 1 defined as follows: 


U,(u, V1, V2) = ( Uu < a i + Py 


=]— ao — T+ Su S min (0, » — ») 
— 1 — max (v1, %) 
ie 
min (0, 2 — v:) S u S max (0, %» —v;) 
-1-274 max (0,% —) SusT-y 


= 0 


T—y,sS4u 


To establish (3.10) one makes the change of variable u = % — ¢, 
expression 


T—v; T—v2 
T’EDs(v;)D rv) = [ / dt, dt, 
“0 “0 


, v = ft, in the 


{Q(v1 , te — hy, — th + 2) + Rib — &) RC — th + v2 — 1) 


+ R(te — th + 2) R(t — th — 0)}. 
As a first consequence of (3.8) and (3.10, we obtain the following theorem. 
THEOREM 3: For any non-negative numbers v, v; , and v2, 


3.12) lim T’”* | ERr(v) — R(v) | = 0, 


T+2 


lim T Cov[Rr(v), Rr(ve)| 


T+© 


= [ au {Q(v,, u,u + v2) + R(u)R(u + vy — v2) + R(u + 1) R(u — v)} 


= an { I dw, dw, exp [i(wivi + wer2)]g(wi, —we, we) 


+ | dwf*(w) exp [iw(v; — v)) + | avf*(w)exp [tw(v, + vs}. 


/ 





336 EMANUEL PARZEN 


4. Estimation of spectral averages. To estimate the spectral average J(A) 
there are two methods available, which may be called the method of filtering 
and the method of covariance averages. In the method of filtering, one estimates 
the variance (zero lag covariance) of a new time series obtained by filtering the 
observed series. In the method of covariance averages, one defines a sample 
spectral average J7(A), which may be expressed as an average with respect 
to the sample spectral density function or with respect to the sample covariances. 
This latter form is the more convenient for computations. Only the method of 
covariance averages is discussed here. 

Spectral averaging functions: A function A(w) will be called a spectral averag- 
ing function if it is a real valued function which is both absolutely integrable 
and square integrable. Its Fourier transform 


‘ \ 1 —tw? 
(4.1) a(v) = — | e ”’A(w) dw 
2r 
is then bounded and square integrable. We call a(v) a covariance averaging func- 
tion. We assume finally that 


(4.2) |a(v) | = o(|v|”) for some r > }. 


If A(w) has finite total variation (and also, in the continuous parameter case, 
vanishes at infinity), then | a(v) | = O (| v|~*). From (4.2) it follows that, 
for some constant K,; and some e > 0, 


7 
(4.3) [| @) | do s KW 
— 7 


for all 7’, and also that 
(4.4) / |v |? | a(v)R(v) | dv < @. 


A lemma: Of frequent use in the sequel will be the following lemma. 
Lemma 4: Let g = O and s > 0. Let Mr be a sequence of constants tending to 
«© as T’'— «. Suppose that 


(4.5) / |v |* | a(v)R(v) | dv < ~. 


Then, as T — ~, 


(4.6 Mt [ | a(v)R(v) | dv +0, 


“lelea Mer 


1 “ ’ 
4.7) - , |ete r yid 0 


Sample spectral averages: The spectral average J(A) may be defined in terms 
of either the spectral density or the covariance function by 





STATIONARY TIME SERIES 


(4.8) J(A) = | A(w)f(w) dw = a(v)R(v) de. 
Accordingly, we define the sample spectral average J7r(A) by 


e aT 
(4.9) JA) = | A(w)frlw) dw = a(v)Rer(v) dv. 
« . 7 
The properties of J7(A) as an estimate of J(A) are given in the follow- 
ing theorem: 
THEOREM 4: For any spectral averaging functions A(w), Ai(w), and A2(w), 
(4.10) lim T’? | EJ-(A) — J(A) | = 0, 


T7o 


lim T Cov [Jr(Ay), J(A2)] = 4 | duf*(w)AY(w)AX(w) 


(4.11) 


+ 2r I dw, dw. a(w,, — We, we) As(w;)A2(we), 


where 
4.12) A*(w) } {A(w) + A(—w)}. 
Proor. Omitted, since it is similar to the proofs of Theorems 5A and 5B. 


5. Estimation of the spectral density. Various authors have pointed out that 
the sample spectral density function, or periodogram, fr(w) is not a consistent 
estimate of the spectral density function f(w). The suggestion has been made to 
estimate f(w) at a point wo by averaging the values of f7(w) in the neighborhood 
of wy . However, this yields a consistent estimate not of f(wo), but rather of the 
spectral average in the neighborhood of wo . To eliminate this bias, one needs to 
narrow the neighborhood averaged over as the sample size is increased. The 
manner in which this is to be done is examined in this section. A similar method 
of obtaining a consistent estimate of the spectral density is that of Bartlett 
(see [1]), who has suggested a modified form of the periodogram. More general 
methods of constructing consistent estimates of the spectral density have been 
given by Grenander [3] and Tukey [7]. In this section these methods are general- 
ized somewhat further. A noteworthy feature of the general method of construct- 
ing consistent estimates of the spectral density which is discussed in this section 
is that one ray construct estimates which are consistent of any prescribed 
order T°", where 0 < a < }. 

Covariance averaging kernels: A function k(z), defined for all real z, will be 
called a covariance averaging kernel if it fulfills the following conditions. It is 
even, bounded, square integrable, and normalized so that k(0) = 1. Its Fourier 
transform K(w) is defined (as a limit in quadratic mean) to satisfy 


(5.1) , id a | e*""K(w) dw. 





338 EMANUEL PARZEN 


It is assumed that there is a constant K,; and an e > 0 such that 


T 
(5.2c) B | k(Bv) | dv < K\(BT)"*-*, 
7 


(5.2d) B > | k(Bv) | < K\(BTY*“, 
lel ST 
for every B and T. A sufficient condition for (5.2) to hold is that k(z) satisfy 
(4.2). 
Given a kernel i(z), and a positive number r, define 
1 — k(2) 


~ (r) ; 
(5.3) i’ = lim — 
z~0 


We assume next that there is a largest number r, called the characteristic exponent 
of the kernel k(z), such that k exists and is finite (nonzero). If the limit in 
(5.3) exists for every positive r, then the kernel is said to have characteristic 
exponent «. 

Estimates of the spectral density: Let k(z) be a covariance averaging kernel 
and let By be a sequence of constants tending to 0, as JT —> ~, in such a way 
that ByT — ~. As an estimate of the spectral density function we define the 
even function 

eT 
(5.4e) ftw) = = | e’k(Brv)Rr(v) dv 
oJ 


2r 


a \ j —ivu 
(5.4d) = — z. € k(Brv)Rr(v). 
2m \vlsT 
The constant By may be called the bandwidth of the estimate. In terms of the 
sample spectral density, one may write 
i] 

hes * 1 -(r°\ —- w 
(5.5) fr(w) = — K | ——— (A) dr 

d r( ) Br i ( Br ) fel ? 
where f7(A) is to be defined as a periodic function in the discrete parameter case. 
Alternate ways in which f7 (w) may be written in the discrete parameter case are 


ftw) = [" ago) 2X x (AH) 


By n=—o 
f#(w) = | adfxQ)Kx0 — w) 
where K,(w) is defined so that 
k(Brv) = | e ’’K7(w) dw. 


Various estimates of the spectral density which have been proposed (see Bart- 
lett [1], [2], Grenander [3], Tukey [7]) may be obtained as instances of (5.4). 





STATIONARY TIME SERIES 339 


By choosing k(z) = 1 — | z| if |z| S 1 and O otherwise and letting By = (1/M), 
where M is an integer less than 7’, one has a modified form of Bartlett’s estimate: 


* am (1 - |) R(0). 


Qn v= M M 


By choosing k(z) = sin z/z and letting By = h, one has Daniell’s estimate: 


T ° 
aime tl hv 
3 > ; sin (hv) 


2a =_T hv 


h 
R,(v) = ay fe — w) ax. 


By choosing k(z) = 1 if |z| < 1 and 0 otherwise, and letting By = (1/M), one 
has the truncated estimate 


M 


which, in view of the fact that the Fourier transform K(w) of k(z) is not non- 
negative, has the possibly undesirable property that it is not necessarily non- 
negative. 
The properties of the estimate ft(w) are embodied in the following theorems. 
Turorem 5A. The asymptotic covariance of the estimate fr(w) defined by (5.4) 
salisfies, for any non-negative frequencies w, and we, 


(5.6) lim TBy Cov [f?(wi), f7(we)] = f?(w) [¥@ dz{1 + 6(0, wy) }4(wi, we), 
To 
where 5(w; , W2) = Lif uw, = w, and 0 if w, + we . Further, for any « > 0, the limii 
in (5.6) is uniform in w, and w2 such that w, 2 eand w2 2 «. 
Remark. The integral in (5.6) is net to be replaced by a summation in the 
discrete parameter case. 
TuEoreM 5B. Let q > 0 be such that 


, 


(5.7¢) | |v |*| RQ) | dv < « 


(5.7d) > | v |*R(v) < @. 
Define the generalized qth spectral derivative f(w) by 


1 


(5.8¢) f@(w) = = e'™’ |v |*R(v) dv 


(5.84) x re | vy PR). 
Let the covariance averaging kernel k(v) have characteristic exponent r = q. Let the 


constants By be chosen so that 


(5.9) 0 < lim TB}** < ow. 


T-2 





340 EMANUEL PARZEN 


Then, uniformly in w, 


(5.10) lim Br™ | Efr(w) — f(w) | = | kf (w) |? ifr = q 


T+-3 


= 0 ifr > q. 

Proors. We first show that the term b7(v), defined by (3.6), has no effect by 

showing that uniformly in w, 
| T 2 
(5.11 lim TB,E | | dve*’’k(Brv)br(v) | = 0. 
T7o i? 

By Minkowski’s inequality, (3.8), and (5.2), the square root of the quantity 
in (5.11) whose limit is being taken is less than 


- 
K(TB)""B, [ dv | k(Brv) | < KK,T~, 
—7 


which tends to 0 as JT — o~. 
We next establish Theorem 5B. By (3.6), we write 


. ; 
(5.12 Qnft(w) = / dve*’’k(Brv) < Dr(v) + br(v) + R(v) ( i — et) * 
be L 


Therefore 


» 
2eBr*(Ef r(w) — f(w)) = EB; | dve*’” k(Brv)br(v 
7 
T 
— B;* [ dve*’’(1 — k(Brv))R(v) 
—J 


eT 
- S B;* | dve**” | v | k(Brv)R(v) 
1 T 


— B;' | dve**”R(v). 
lvi2T 


We now show that the first, third, and fourth terms in (5.13) tend to 0, as T 
uniformly in w. From (5.9) and (5.11), it follows that, uniformly in w, 


9 


1 T fe : 2 
lim EF a | lve" k(Brv)br(v) | = 0. 
one | BY *, € ( T r( 
Next, if M is an upper bound for | k(v) | , the third term in (5.13) tends to 0 by 
(5.7) and (5.9) if q = 1, and if g < 1 it is in absolute value less than 


M 1 oT 
eS ae Ps 
(TBr)? am | |v] | Rt) |, 


which tends to 0 by (5.7) and Lemma 4. Similarly, the fourth term in (5.13), 
which is in absolute value less than 





STATIONARY TIME SERIES 


l « 
(TB)? ar | ce di R(v) 


ler 
tends to 0. 

Consequently, (5.10) is proved for the case that the kernel has characteristic 
exponent r = q, since the second term in (5.13) then tends, uniformly in w, to 
—2xr kf (w). Next, to prove (5.10) for the case that r > gq, it suffices to show 
that ther 


eT 


(5.14 lim Bz" | | 1 — k(Brv) | | R(v) | dv = 0. 
-T 


Too 


Let M, M, , and D be positive constants such that | k(v) | = M for all v, and 


(5.15 1—k(v)| Ss M,\v for|v| s D. 


If the characteristic exponent is infinite, we may take any exponent r > gq in 
(5.15). Let s = r — q. Then the quantity in (5.14) whose limit is being taken is 
less than 


M, BS | 


v |*"*R(v) dv + MB;* | R(v) dv, 
lelsDBr-* v|>DBr-! 
which tends to 0 in view of (5.7) and Lemma 4. 
We next establish Theorem 5A. In view of the foregoing, it follows that the 
desired asymptotic covariance in (5.6) is given by the limit, as T — ~, of 
T 


(5.16) 7 - TB; | | dv; dv2 COS W; V1 COS We vak( Br v,)k(Br v2) EDr(v,)Dr(v2). 
Tk “0 0 


We may write (5.16) as a sum of three 3-fold integrals, by replacing TED,(;) 
D,y(v2) by its value (3.10). The term in this sum which involves Q(v , u, u + v2) 
clearly vanishes in the limit, uniformly in w,; and w . 

Next we show that the term involving R(u + v,) R(u — v2) also vanishes in 
the limit, uniformly in w, and w, . For this term is less than 


-T _T T 
Br | dvs dn | du k(Brvk(Brv2)R(u + Vv) R(u — Ja 


“0 T 


Making the change of variable v; = z; — v2, u = z + v2, this becomes 


T T+v2 T—v, 
Br | av. | da [ dz k(Brv2)k( Br ve = Brz)R(z)R(z + 2) |. 


T—ve2 


Making the change of variable z2 = Byv; , this becomes 


pBrT T+(29/Br) T(1—(2_/TBr)) 
der | de | dz | k( 22) k( 22 os Br 2) R(z)R(z + 21) i. 
( — 


J 29/B 7) T(1+(29/TBr)) 


which tends to 0 as T — , since the region of integration over the z, variable 
tends to infinity. 





342 EMANUEL PARZEN 


The value of (5.6) is then given by the limit of 
1 ~T ~7T 
— Br | dv dvs COS W, 0; COS We Mek(Brvy)k( Br vs) 
a 


0 -0 


(5.17) 


T 
[ duU r(u, 1, v)R(u)R(u + vy — ve). 
t=? 
By the change of variables u, Vv, — Ue, Ue = ve this becomes 

T—e 


l * : 
-~ Br dus | du, cos w,(uy + Ue) COS We Ue 
Tr “0 wu 


aT 


k(Bru)k( Brus + Bru) duU (u, uy + Ue, U)R(u)R(u + uw). 
Jr 


By the change of variable z = Bru, and the formula 2 cos A cos B = cos 
(A + B) + cos (A — B), one obtains that (5.18) is equal to 


1 ~BrT »T(1-(z2/TBr)) 


dz | du, 


2r* Jo “—(2/Br) 


(5.19) < cos E (* Be =) + Ww ws| + cos | (“= *) + wm «| > 


T 


k(z)k(z + Bru) e duU r(u, uw, + ee x) R(u)R(u + uw). 


By referring to (3.11), it may be verified that, as 


T— 2, Ur(uu + xe) 


Now to evaluate (5.19), one may distinguish three cases: case I, wy ~ Wy ; 
case II, wy, = w, = w ¥ O; case III, w, = w. = O. In view of the Riemann- 
Lebesque Lemma, the first term in (5.19) vanishes in the limit if w, — w. = 0, 
and the second term vanishes in the limit if w; + w. ~ 0. Further, for any 
« > 0, the convergence to 0 is uniform in w; and w, such that w, 2 ¢ and w2 2 «. 
Thus one obtains that, in the limit, the value of (5.19) is 0 in case I; in case II, 
it is equal to 


© 


(5.20) a | 
0 


2n? 


~ ~ 
k*(z) az | du, cos wu | du R(u)R(u + uw); 

0 = 
and, in case III, it is equal to twice (5.20). It is easily verified that (5.20) and 
(5.6) are equal. 

To adapt the foregoing argument to the discrete parameter case requires 
some care in the phase of the argument following (5.19). The integration in 
(5.19) involving the variable z should be replaced by a summation over the 
lattice points z; = jBr, wherej = 1,--- , T. As T — ~, the distance between 
the lattice points tends to 0, and the highest lattice point tends to infinity, so 





STATIONARY TIME SERIES 


that the summation may be regarded as approaching the integral {ff k°(z) dz, 
as above. 


6. Optimum consistent estimates of the spectral density. In view of Theorems 
5A and 5B, the means are now at hand for choosing that estimate f7(w), of the 
form of (5.4), which is optimum in the sense that it possesses an order of con- 
sistency not less than that of any other such estimate. We obtain the following 
theorem. 

THEOREM 6: Suppose that (5.7) holds. Let the constants By be chosen so that, for 
some finite positive number B, 

(6.1) lim T°" BR, = B. 


Tox 


q 
1 + 2q° 
Then for any covariance averaging kernel k(v) with characteristic exponent r = q 
. ° * ‘ . 
the corresponding estimate fr(w) possesses an asymptolic mean square error given by 


a 


lim TE | f3(w) — fw) |? = LS f 2) azti + 400, w)} 
le 7 , B . 
+B | Ks (w) 

Remark. If ‘yg <r, then k” = 0. 

Now, as q increases, the exponent a, as defined by (6.2), increases from 0 to 3}. 
Thus the factor which determines the highest order of consistency which may 
be attained, is the largest positive number g such that (5.7) holds. For want of 
a better name, we call this largest q the exponent of uncorrelation of the time series 
whose covariance function is R(v), since the larger q is, the faster R(v) falls off 
as v — , and the less correlated are successive observations of the time series 
If (5.7) holds for all finite values of q, as is the case if R(v) decreases exponentially, 
we define the exponent of uncorrelation to be infinite. 


For computational convenience, the kernel with characteristic exponent r 
that we prefer is 


(6.4) t(z) = g|° if 


ct << 
otherwise. 
Such a kernel leads to an estimate which does not require the computation of 


all the sample covariances. With this choice of kernel, f?(w) may be written 
letting M; = (1/Br) s T, 


an po =k oom fi -( 
«Tf \visMr 


The foregoing results may be interpreted from two points of view, emphasiz- 
ing either the choice of kernel k(z) or the choice of constants M; = 1/Br (which, 





9) 


344 EMANUEL PARZEN 


in the case of a kernel vanishing for | z| > 1, represent the number of sample 
covariances included in the estimate). 

Let a kernel k(z) be chosen whose characteristic exponent is r. Then the order 
of consistency of the corresponding estimate cannot be greater than T**”, where 
a(r) = r/(1 + 2r), and will be 7°*, where a < a(r), if the constants Mr satisfy 
the relation for some finite positive number M, 


(6.6) lim Mr = M 
ron Te 
and if (5.7) holds for g = a/(1 — 2a). 

Therefore, if Bartlett’s modified periodogram (which is (6.5) with r = 1) is 
used as the estimate, its order of consistency cannot be greater than T*”, and will 
be T°“ (where a < 3) if the number of sample covariances included in the esti- 
mate is 7”°*. If the truncated periodogram (which is (6.5) with r = @) is used 
as the estimate, its order of consistency will be T°* (where a < 4) if Mp = T**, 
and if (5.7) holds for g = a/(1 — 2a), which would be the case if the exponent of 
uncorrelation is infinite. 

On the other hand, let the constants M, be chosen so that (6.6) holds for 
some a between 0 and 4. Then the order of consistency of the corresponding 
estimate f7(w) is T°“, no matter what the value of the characteristic exponent r 
of the kernel used so long as r 2 g(a) = a/(1 — 2a), and (5.7) holds 


for g = q(a). 


7. Alternative estimates of the spectral averages. In our study of the con- 
sistent estimates of the spectral density, we were led to consider estimates, such 
as Bartlett’s modified periodogram, which had the property of only requiring 
the computation, on the basis of an observed sample of length 7’, of the sample 
covariances R,(v) for | v | less than some root of 7’. In this section we show that 
for the spectral averages J(A), one may define estimates J}(A), alternative to 
the previously given estimates J7(A), which have the same order of consistency 
and asymptotic variance as the latter, and require the computation of fewer 
sample covariances. 

Let A(w) be a spectral averaging function, with Fourier transform a(v). Let 
k(z) be a covariance averaging kernel, with Fourier transform K(w). Let Br 
be a sequence of constants tending to 0. Let f7(w) be defined by (5.4). Define 


(7.1) J*(A) = | ft(w)A(w) dw. 


One may write J7(A) in terms of the sample spectral density function by 
(7.2) J*%(A) = | fr(w)A r(w) dw, 
where 


l 
A,lw) = = 





STATIONARY TIME SERIES 


In terms of the sample covariance functions, one may write 


- 
(7.3¢) J#(A) = | a(v)k(Brv)Rr(v) dv 
T 


(7.3d) = > alv)k(Brv)Rr(v). 
lelsT 
The properties of the estimate J7(A) are embodied in the following two theo- 
rems, whose proofs are omitted. 
TueoreM 7A. For any two spectral averaging functions A,(w) and A2(w), the 
covariance Cov|J7(A)), J7(A2)] satisfies (4.11). 
THEOREM 7B. Let a(v) be a covariance averaging function. Let q > 4 be such that 


(7.4) | |v {* | a@)R@) | dv < @. 


Let k(z) be a covariance averaging kernel with characteristic exponent r = q. Let 
the positive constants By be chosen so that 


..,(|=90 if r=q, 

(7.5) limsup T"” Bt { S 
Te i<@o i r>¢@. 

Then the bias E J7(A) — J(A) satisfies (4.10). 

Optimum Estimates: The estimates J7(A) are all equivalent from the point 
of view of their order of consistency and asymptotic variance. If one desires to 
choose between them, the only basis is computational convenience. It is with 
this in mind that the following remarks are made. For the covariance averaging 
kernel, we choose k,(z). Then (7.3d) becomes, letting Mr = (1/Br) 


(77) JHA) = LD a@){1- ( : IY} Ret. 
jelsMr \ M, 

We choose B, to be of the form By = T-™, where the positive exponent m is to 
be chosen as small as possible, so that the number of terms in (7.7) will be as 
small as possible. Let gq be the largest positive number such that (7.4) holds. 
Assuming gq to be finite, choose r = g. Then J7(A) will give a consistent esti- 
mate of J(A), involving the calculation of a minimum number of sample covari- 
ances, if m-is chosen as near to the lower bound as possible in the inequalities 


1 : 
= fr= gq, 
> 24 1 q 
(7.8) 
m= ho ifr> gq. 
2q 
8. Interpolating the spectral density. In order to obtain an estimate of the 
complete graph of the spectral density function f(w) by means of the estimates 
fr(w) discussed in the foregoing, one needs to compute the estimate at all values 
of w. In this section, estimates f7*(w) are constructed, which are equivalent to 
ft(w) from the point of view of order of consistency and asymptotic variance, 





9 


346 EMANUEL PARZEN 


and which require the computation of only a finite number of quantities in order 
to obtain the entire graph. Only the discrete parameter case is discussed in detail. 
To begin with, define 


2am 


Wn T') for m = 0, +1, ---, +T, 


8.1) ~ +1 


otherwise. 


We now show that f7(w), as defined by (5.4), may be expressed in terms of its 
values at the above (27 + 1) lattice points by the formula 


Tv 


fr(w) = 2a Cm(w; T)fr(wm(T)), 


where 


T 
Cn(w; T) = am : 7 exp [—iv(w — w,(T))] 
Zi + 1 owt 
(8.3) , we 
sin [((1/2)(2T + 1)(w — Wm(7'))] 
(27 + 1) sin [(1/2)(w — w,(T))] 


To prove (8.2), we note that, for» = 0, +1,---, +7 and any w, 


A 
(8.4) ec" = > c(w; T) exp [—ivw,,(T)], 

m=—T 
which may be verified by expanding the right-hand side. It is now easy to obtain 
(8.2) by substituting (8.4) into (5.4). 

If fr(w) is given by (6.5), then it is determined by its value at even a fewer 
number of points, namely the lattice points w,,(M 7), since by the same argument 
as above we may write 
(8.5) fr(w) = >. Cm(W: Mr)ft(wn(M 7) ). 

|m| SM r 

Thus it is seen that it suffices to compute fr(w) at a finite number of points in 
order to know it on the entire interval 0 S$ w S -. In view of the peaked nature 
of the c,,(w; 7’) functions for large 7’, it might be thought that an adequate 
approximation to fr(w) would be fr(w,(7')), where w,,(7') is the lattice point 
nearest to w. The problem which is raised by the representations (8.2) and (8.5) 
is when is such an approximation valid. From a statistical point of view, the 
justification of such an approximation must be in terms of its providing an 
estimate which has the proper order of consistency and asymptotic variance. 
It is from this point of view that we now consider the problem of using the value 
of f7(w) at a finite number of points to obtain estimates of its value at all points. 

Let dry be a sequence of constants tending to 0, and define the function, for 
w = 0, 


(8.6) W(w) = Bi dr, 





STATIONARY TIME SERIES 347 


where [x] denotes the largest integer smaller than x. Consider the following 
estimate of the spectral density function 


(8.7) fr*(w) = fr(Wr(w)), 


where f7(w) is defined by (5.4). This estimate clearly has only a finite number of 
distinct values. The properties of this estimate are embodied in the following 
theorem. 

THEOREM 8: Assume that the conditions of Theorem 6 are fulfilled, so that the 
estimate f7(w) is consistent of order T°“, with asymptotic variance given by (6.3). Let 


(8.8) gr wo as 
q 1 + 2q 


Let the positive constants dy be chosen such that 


— 2a. 


(8.9) limsup 7° dy < « if0<¢q< 1, whence 0 < a < 3, 


T+2 


(8.10) lim T* dr = 0 ifg21, whence § S a < 3. 

T+ 
Then the estimate fr*(w) is consistent of order T°", with the same asymptotic bias 
and asymptotic variance as f7(w). 

Proor. One may suppose w > 0, since fr* (0) = f7(0). Now w — dr S 
W,(w) S w, so that W,(w) — w. In view of the uniform convergence in (5.6), it 
follows that the asymptotic variance of fr*(w) is the same as that of f?(w). 
Next, in view of the uniform convergence in (5.10), to establish that f7*(w) has 
the same asymptotic bias as that of f7(w) it suffices to show that 
(8.11) lim T*| f(Wr(w)) — f(w)| = 0. 


T+-2 
Now the quantity in (8.11) whose limit is being taken is less than 


T*ds > |v | RO +2™ D , | RO) |. 
, 2T 


ivi gre 
The second term tends to 0, since it is less than 


are Do lei *| RO). 


The first term also tends to 0; by (8.10) if ¢g 2 1, and by Lemma 4 and (8.9), 
if g < 1, since the term may be written 


1-¢ 
1a, (2) ¥ |v} | RO). 


T8 


loi sre 
If dy is chosen by 
(8.12) dy = xT *”, 
then (8.9) and (8.10) will be satisfied for } S a < 4 (which corresponds to 


4 = q < ~). It would seem that (8.12) provides a safe universal choice of the 


spacing of the lattice points. 





348 EMANUEL PARZEN 


If it is desired that the estimate of the spectral density be a continuous func- 
tion, without jumps, then one may use the estimate 


(8.13) ft*(w) = arft(Wr(w)) + (1 — ar) f8(Wr(w) + dr), 


where a, is a sequence of constants between 0 and 1, not in general approaching 
a limit. It may be verified that Theorem 8 holds for the estimate given by (8.13), 
provided that in (8.9) it is required that the limit be 0. Then it follows that 
(d7/Br) — 0, and 


lim T** Cov [f£(Wr(w)), f£(Wr(w) + dr)] = lim T** Var[ft(w)]. 


T-2 T-2 
REFERENCES 
[1] M.S. Barrett, ‘‘Periodogram analysis and continuous spectra,’ Biometrika, Vol. 37 
(1950), pp. 1-16. 
[2} M. 8. Bartietr, An Introduction to Stochastic Processes, Cambridge, 1955. 
[3] U. Grenanper, “On empirical spectral analysis of stochastic processes,” Ark. Mat., 
Vol. 1 (1951), pp. 503-531. 
U. GRENANDER AND M. RosEnBLartt, ‘Statistical spectral analysis of time series aris- 
ing from stationary stochastic processes,’”’ Ann. Math. Stat., Vol. 24 (1953), pp. 
537-559. 
U. GRENANDER AND M. Rosens.iatt, ‘Regression analysis of time series with sta- 
tionary residuals,’ Proc. Nat. Acad. Sci., Vol. 40 (1954), pp. 812-816. 
T. A. Maenegss, “Spectral response of a quadratic device to non-Gaussian noise,”’ 
J. Appl. Phys., Vol. 25 (1954), pp. 1357-1365. 
J. Tuxey, ‘‘Measuring Noise Color,”’ an unpublished manuscript prepared for distribu- 
tion at a meeting of the Institute of Radio Engineers, November 7, 1951. 
8] E. Parzen, Optimum Methods of Spectral Analysis of Finite Noise Samples, Columbia 
University, Hudson Laboratories Technical Report No. 39, issued Summer 1956. 
[9] J. L. Doos, Stochastic Processes, New York, 1953, 
[10] M. Love, Probability Theory, New York, i955. 





INTRA-BLOCK ANALYSIS FOR FACTORIALS IN TWO- 
ASSOCIATE CLASS GROUP DIVISIBLE DESIGNS':? 


CiypE YounG KRAMER AND RALPH ALLAN BRADLEY 
Virginia Polytechnic Institute 


1. Introduction and summary. Group divisible incomplete block designs 
form an important class of incomplete block designs useful in a wide variety of 
experimental situations. Their properties, construction, and analysis have been 
thoroughly discussed in statistical literature, and w cite only several recent 
references [1], [2], and [3] to work of Bose and his co-workers dealing with par- 
tially balanced designs with two associate classes with which we shall be con- 
cerned. 

The utility of incomplete block designs would be increased with means of in- 
corporating factorial treatment combinations in them. The use of factorials is 
widespread and stimulated by the concepts of confounding, partial confounding, 
and fractional replication. A mathematical summary on factorials is given by 
Kempthorne [4]. Kramer and Bradley [5] considered factorials in near-balance 
incomplete block designs, and here we generalize to the wider class of group 
divisible designs with two associate classes. Harshbarger [6] used a 2* factorial 
in a Latinized rectangular lattice and this seems to be the first use of a factorial 
in a partially balanced incomplete block design. 

We obtain the intra-block analysis of variance for two-associate class group 
divisible designs with the adjusted treatment sum of squares in a modified form 
that more clearly indicates the structure of that quantity. Factorial treatment 
combinations are then identified with basic treatments through the association 
scheme of a design. This identification is effected in such a way that the factors 
are divided into two groups. For example, the design for 18 treatments (see [2], 
Design 860), divisible into six groups of three, in blocks of six, treatments repli- 
cated five times, can be adapted to a 6 X 3 factorial scheme; by regarding the 
six groups as made up of a 2 X 38 classification, the same design can be used for 
a 2 X 3? factorial scheme. Single-degree-of-freedom comparisons are obtainable 
in much the usual way and use of fractional replication, essentially within the 
groups of factors, is possible. The analyses for factorials depend on the estimators 
of basic treatment effects. 

We are not concerned with the construction of two-associate class group divis- 
ible designs and all known such designs for which r S 10,3 < k S 10, wherer 
is the number of replications and k is the number of plots per block, are given 
in {7}. 


Received July 23, 1956; revised October 12, 1956. 

1 Research sponsored by the Agricultural Research Service, U.S.D.A., under a Research 
and Marketing Act Contract, No. 12-14-100-126(20). 

2 A condensation of part of a dissertation by C. Y. Kramer, written under the direction 
of R. A. Bradley and submitted to the Virginia Polytechnic Institute in partial fulfillment 
of the requirements for the Ph.D. degree in Statistics. 


349 





350 CLYDE YOUNG KRAMER AND RALPH ALLAN BRADLEY 


2. Definitions and notation. Bose, Clatworthy, and Shrikhande [7] list the 
following properties of group divisible designs with two associate classes: 

(i) The experimental material is divided into b blocks of k units each; dif- 
ferent treatments are applied to the different units in a block. 

(ii) There are v = mn treatments (v > k) and the treatments can be divided 
into m groups of n each such that any two treatments of the same group are 
first associates while two treatments from different groups are second asso- 
ciates. Each treatment occurs in the design r times, and vr = bk. 

(iii) Each treatment has exactly (n — 1) first associates and n(m — 1) second 
associates. 

(iv) Given any two treatments which are ith associates, the number of 
treatments common to the jth associate of the first and the kth associate of the 
second is pix, (i, j, k = 1, 2), and is independent of the pair of treatments se- 
lected. In matrix notation, if P; is the matrix with elements pj, , 


| (n — 2) 0 0 (n — 1) 


| 
jand P», 
| 


I 0 n(m — 1) (n — 1) n(m — 2) | 


(v) Two treatments which are ith associates occur together in exactly i, 
blocks, 7 = 1, 2. 

(vi) The inequalities, r = A, , rk — Aw 2 O, hold. 

(vii) The design parameters are related so that (n — 1)Ai: + n(m — 1)dAy = 
r(k — 1), or rk — Aw = r — Ai + n(A. — Xe). Group divisible designs have 
been divided into three subclasses, Singular, Semi-regular, and Regular, but 
we shall consider the class as a whole without subdivision. 

We let V;; denote the jth treatment of the ith group noted in (ii), = 1, --- , 
m;j = 1,---,m. Then the usual association scheme is given by the matrix V 
with elements V;;. Two treatments with common first subscripts (in the same 
row of V) are first associates; otherwise they are second associates. The double 
subscript notation is introduced here for it will be convenient when we come 
to consider factorials. To use the design catalogue [7] it is only necessary to 
match our treatment designations with those in the association matrices where 
treatments are numbered serially. 

The model that will be assumed for group divisible incomplete block designs 
is that 


(2.1) Yijs = B + Tij + Bs + Cije » 


where y;;, is the observation on V,; in block s if that treatment occurs in block 
8, » is the grand mean, 7;; is the effect of V;;, 8, is the effect of block s, and 
€;j are independent normal variates with zero means and homogeneous vari- 
ances, o°. Latin letters m, t;; and b, will be used for estimators of the parameters 
in (2.1). Restrictions on the parameters in (2.1) are 


(2.2) > 125 713 = 0 





INTRA-BLOCK ANALYSIS 


and 
(2.3 > Be = 0. 


The parameter 8, in (2.1) may sometimes be redefined when the blocks are 
arranged in replications or a Latin square [8], [9]. We shall not explicitly con- 
sider these situations since the modifications involved do not affect the estima- 
tion of the adjusted treatment sum of squares. 


3. General regression theory. Let 


Ye = n+ 2) Biria + fa, 
t=) 

- , N, represent a general regression model where the z;. are con- 
stants and the e, are independent normal variates with zero means and homo- 
geneous variances, o. The 8; are regression parameters subject to 7m linearly 
independent restrictions, 


(3.2 > anB; = 0, =1,---,m < (k— 1), 
defining a parameter space 2. The aj in (3.2) are known constants. A null hy- 
pothesis introduces rz additional restraints through additional linearly inde- 
pendent equations like (3.2) forh = m1 + 1,---,7 + 7 < (k — 1) and thus 
defines a parameter space w, a subspace of 2. 

The general theory of regression tests under the conditions set forth lets us 
state that Reg (8|9)/o°, Reg(8|w)/o, and [Reg(8|2) — Reg (8}w)\/o 
have x’-distributions respectively with (k — 1), (k — ri — 12), and rz degrees 
of freedom independent of Res (8 | 2)/o’, which also has a x’-distribution with 
(N — k + rm — 1) degrees of freedom. Reg (8 | Q) is the sum of squares due 
to regression on the z-variables in (3.1) with the regression coefficients subject 
to the restraints (3.2); Res (8 | 2) is the resultant sum of squares of deviations 
about that regression line. Reg (8 | w) is the sum of squares due to regression 
on the z-variables in (3.1) with the regression coefficients subject to the totality 
of (r; + re) restraints defining w. We note that 


k 
(3.3) Reg (8 | 2) = >> big, 


i=l 


k 
Reg (8 | w) = = b 9, 


ia] 


N 


Res (8/2) = DE (ye — 9)” — Dba, 


awl t=] 


where 
N 


(3.6) > (Ya — Y)Xias 


acs! 





352 CLYDE YOUNG KRAMER AND RALPH ALLAN BRADLEY 


b; and b; are the least squares estimators of 6; under the restraints of 2 and w 


respectively, and 7 = >-%_,y./N. An F-test of the indicated hypothesis is 
possible based on 


(3.7) F=(N—k+n — 1)[Reg (8|2) — Reg (8 | w)]/rz Res (6 | Q), 


with rz and (V — k + 7, — 1) degrees of freedom. 

To illustrate the application of this theory, we consider the model (2.1) cor- 
responding to (3.1) and the restrictions (2.2) and (2.3) corresponding to (3.2) 
and defining 2. Now N in the regression theory is replaced by vr, k by (b + »v), 
r; by 2, and rz by (v — 1). The regression coefficients 8; become treatment and 
block effects, 7;; and 8, . To test the hypothesis that 7;; = 0 for all 7 and 7 in 
(2.1), the hypothesis of “no treatment effects”, it is only necessary to add 
(v — 1) additional linearly independent restrictions on the 7,;; to insure that 
each 7;; = 0, thus defining w. The adjusted treatment sum of squares with 
(v — 1) degrees of freedom becomes 


(3.8) Reg (8, r|2) — Reg (6, 7 | w), 
where 

(3.9) Reg (8, 712) = Dido tT +d OB, 
and 


(3.10) Reg (8, r|w) = >-, OB. , 


the latter sums of squares having respectively (b + v — 2) and (b — 1) degrees 
of freedom. 7;; is the total for treatment V;; and B, is the sth block total. b, 
and b;, are the estimators of 8, under Q and w respectively; ¢;;, the estimator 


of 7;; under w, is necessarily zero. The error sum of squares for the intra-block 
analysis of variance is 


(3.11) Res (8, r|2) = 3°;>05>>. (vie — 9)” — Reg (8, r| 2), 

with (vr — b — v + 1) degrees of freedom. In (3.11), note that the summation 

is restricted to values of 7 and j occurring with s through the properties of the 

designs considered; this will be the case throughout this paper. The unadjusted 

block sum of squares is given by (3.10) and has (b — 1) degrees of freedom. 
We shall use the theory summarized in this section in the subsequent dis- 

cussions. A basis for this theory is given by Wilks ({10], Sections 8.3 and 8.43). 


4. General analysis of variance modified. The basic intra-block analysis of 
variance for partially balanced incomplete block designs with two-associate 


classes is known ({7], Table 1.0). In our notation, the adjusted treatment sum of 
squares is 


(4.1) Adj. Treat. SS. = 0:30; tiQis, 


where 


(4.2) Qi; T's; =o B;;./k, 





INTRA-BLOCK ANALYSIS 353 


with B;;. , the total of block totals for blocks containing V,;. For the subclass 
of group divisible designs, 


(4.3) vr2(Ar + rk — r)ts; == kay + Aw — 2) Qi5 + ku or da) ye Qis 


pri 


obtained from the reference ((7], Eqs. 1.11 to 1.19). If j in (4.2) is replaced by 
q and both sides of (4.3) summed over values of q + j, the resulting identity 
may be substituted back into (4.3) with simple algebraic reduction based on 
the relations (vii) of Section 2 to yield 


(4.4) [(re + rk — r)tss + (Ae — Ax) Do tigh/k = Qi; 


prj 


The adjusted treatment sum of squares expressed in terms of the estimators 
of treatment effects alone is 


(4.5) Adj. Treat. S.S. = Set 8 = FY e+ Ge be > (2% ti)’, 
‘ 7 ‘ 7 
obtained by substituting Q,; in (4.4) into (4.1). 

The result of (4.5) is a form more suitable for the consideration of factorials 
than (4.1). Usually in analysis of variance, computing is based on (4.1). It is 
in fact simpler when using a desk calculator to substitute for the Q;; in (4.3) 
to obtain 


tes = [kvdeTiy— kOe — a) DoT yj — VADs. 

+ (Xe — Ax) Do; Bij.]/odaa + rk — 1) 
trom two-way tables of values of 7;; and B,,;.. Substitution in (4.5) is then 
based on (4.6). 

The analysis of variance is completed by the calculation of the unadjusted 


block sum of squares and the total sum of squares, for the error sum of squares 
is obtained by subtraction. 


(4.6) 


(4.7) Unadj. Block 8.8. 


(4.8) Total 5.8. 


G is the grand total of all observations, >>; >>;>» yi - Degrees of freedom 
for Adj. Treat. S.S., Unadj. Block 8.8.. Total 8.S., and Error 8.8. are respec- 
tively (v — 1), (6 — 1), (rv — 1), and [(r — 1)v — b + 1). 

The variance of the difference between estimators of first-associate treatment 
effects is 


(4.9) V(tis — tay) = 2ko’/(Ar + rk — 1), 





354 CLYDE YOUNG KRAMER AND RALPH ALLAN BRADLEY 


j ~ 7; the variance of the difference between estimators of second-associate 
treatment effects is 


(4.10) V(b; —- by 5) = 2ko* (A + Aw - deo) /vA2(Ay “T rk — rT), 
t ~ 7’, These variances are estimated by substituting the error mean square 
from the analysis of variance for o’. 

The efficiencies of first and second associate treatment comparisons have 
been given by Bose and his associates [7]. These efficiencies are obtained by 
taking the ratio of the variance of the treatment contrast for a randomized 
block design to the corresponding variance for the incomplete block design 
given equal values of r and on the assumption that both designs yield the same 


experimental error. The efficiency for the comparison of two treatments that 
are first associates is 


(4.11) Ey = (Ay. + rk — r)/rk 
and, for two treatments that are second associates, the efficiency is 
(4.12) Ee, = vdo(\y + rk — r)/rkQ\y + Aw — Ao). 


E, and £, are in more explicit forms than given previously and are derivable 

from (4.9) and (4.10) and the fact that the corresponding variance for the 
e . : 2 

randomized block design is 20°/r. 


5. The basic two-factor factorial. To introduce factorials into two-associate 
class group divisible designs, we first consider a basic two-factor factorial. Then 
it will be possible to show how multi-factor factorials may be used. 

Consider A and C factors with m and n levels respectively providing v = mn 
treatment combinations associated with the V;; so that 


(5.1) Tig = OE TOG T 4; 
with restrictions, 

(5.2) Dia; = 0, 
(5.3) Divs = 9, 

(5.4) Di ds; = 0, 


and 


(5.5) > +85 = 0. 


Equations (5.2) to (5.5) represent (m + n + 1) linearly independent restric- 
tions on the (mn + m + n) new parameters. a; , y;, and 6;; are parameters 
representing respectively the effects of the ith level of the A-factor, the jth 
level of the C-factor and the interaction of the ith level of the A-factor and 


the jth level of the C-factor. Corresponding Latin letters will be used for esti- 
mators of these effects. 





INTRA-BLOCK ANALYSIS 355 


The change to factorial parameters may be regarded simply as a one-to-one 
transformation in the parameter space. It follows that 


(5.6) tis = A; + 05 + dis 
and substitution in (4.5) yields 


Adj. Treat. 8.S. = oe p» a; + a. > ¢ 
k 


after reduction based on (vii) of Section 2. Use of the general regressior theory 
is sufficient to obtain 


+BtR—O Dla 


Adj. 8.8. (A) = me > ai, 


Adj. S.S. (C) = mt one 7 ae. 


(5.10) Adj. 8.8. (Ac) = O+ = oe Db dis, 
‘ ? 


with (m — 1), (n — 1), and (m — 1)(n — 1) degrees of freedom respectively. 
The complete analysis of variance is given in Table 1. Definition of (5.8), (5.9), 
and (5.10) is complete when we note that 


a; = D5 tij/n = t,., 
os DW tej/m _ Ba. 


di; 


computed most easily from the two-way table of values of ¢;;. Independence 
of the sums of squares in (5.8), (5.9), and (5.10) follows from Cochran’s theorem 
{11}. 

We sketch the use of the general regression theory of Section 3 and the appli- 
cation of it to our problem by considering Adj. S.S. (A). 

To effect the regression with the complete model obtained by substituting 
for r,; of (5.1) in (2.1), it is necessary to minimize 


(5.14) P biais (Yeje — mm a 5 — 8 — B.)’, 


subject to the restraint (2.3) and to the (m + n + 1) linearly independent 
restraints of (5.2) to (5.5) through use of Lagrange multipliers. The resulting 





CLYDE YOUNG KRAMER AND RALPH ALLAN BRADLEY 


TABLE 1 
Intra-block analysis of variance for the basic two-factor factorial 


Source of Variation | Degrees of Freedom | Sum of Squares* 


: a = 
Treatments (adjusted) (v — 1) = (mn — 1) | Ki D:D; t,3 
+ Keds Qi tis)? 


A-factor (adjusted) m — 1) nK, + n*K:) 2:8 
C-factor (adjusted) | (n — 1) mK, 2; e; 
AC-interaction (adjusted) m-—1)(n 1) 


| Ky, Di Zi (ts; Pt &,. =F t;)? 


G@ 


1 ‘ 
Blocks (unadjusted _ iz B? - 
| C TU 


Error [mn (7 ) t 1) By subtraction 


G: 


_ 3 + 
Total mnr ) Di li levize —— 


*K, = (A. + rk — r)/k, Ke = (X22 — )/k, and nK, + n*Ke = ndww/k. 


estimators are those given in (5.11) to (5.13) for a; , vy; , and 4;;, and the esti- 
mator of » is G/vr. It follows that 


Reg (a, y, 6, 8 | 2) 
- > a;A; + > eC; + >» dj;Di; + 7s b.B, , 


where A; = 00; 7:5, Cj = Di: Tis, Diz = Tij, and B, is defined after (3.9). 
Q is the parameter space defined by the indicated restrictions. 

The null hypothesis of no A-effects implies (m — 1) additional linearly in- 
dependent restrictions sufficient to make each a; = 0, and they reduce con- 


sideration to a parameter space w, , a subspace of 2. Under these conditions it 
is necessary to minimize 


(5.16) Per (Yije ae. Fe. 6.) 3H Bs) 


with use of Lagrange multipliers and the restraints (2.3) and (5.3) to (5.5). 
Estimators of yu, vy; , and 6;; are unchanged; a new estimator b, of 8, is obtained. 
Now 


(5.17) Reg (vy, 6, 8 | wa) = Dose + Did os disDi; + Do. VB. 


The estimators b, and b; are fairly complex, but we need only note that 


(5.15 


(5.18) b, = b, + ; zy N, (2) a , 


ins 


where n,(7) is the number of times a treatment combination with the 7th level 
of the A-factor occurs in block s. 





INTRA-BLOCK ANALYSIS 357 


Adj. 8.8. (A) is the difference, Reg (a, y, 6, 8|2) — Reg (7, 4, 8 | wa); and, 
using (5.15), (5.17), and (5.18), we have 


(5.19) Adj. 88.(A) = Dade — 7D Dna, 
But 
xo LE aaB. = Da YO n)B. = Dad Bi, 
ins with? : 
and, from the definition of A;, A; — )5; Bis. = 30; Qi; . It follows that 
(5.20) Adj. 8.8. (A) = }.a; 5; Q; 


and 
(5.21) > 5 Qi5 = nAwa;/k, 


the latter result obtained from (4.4), (5.2) to (5.6), and algebraic reduction 
based on (vii) of Section 2. The final form for Adj. 8.8. (A) given in (5.8) is 
now evident and the degrees of freedom are (m — 1), since (m — 1) additional 
restrictions were required to reduce 2 to w, . Adj. S.S. (C) and Adj. S.S. (AC) 
are obtained in much the same way. 

It is of interest in some applications to have the variances of (a; — a,), 
i ~ @, and of (c; — cy), j * j’. These variances are most easily obtained from 


the forms of the multipliers of >; #. = >>; ai and of >>;#; = >0;c} in the 
analysis of variance of Table 1. It follows that 


(5.22) V(a; — ay) = 2ko’/ndw, 


and 
(5.23) Vic; — cy») = 2ko"/m(\ + rk — 1), jx. 


The error mean square of the analysis of variance is used to estimate o° and 
consequently the variances of (5.22) and (5.23). Alternate derivations of (5.22) 
and (5.23) may be obtained through the forms (5.11) and (5.12) given the vari- 
ances and covariances of the ¢,;; . Considerable algebra is involved in the deriva- 
tion of these variances and covariances, and we do not include it here. It may, 
however, be useful to have these results and we now state without proof that 


Ro 7(t.:) = , TS ies | 
(5.24) V (tis) ko E + rk — r) + mnd2v |’ 


> tie oe oO Ot Bay ee 
Cov (titi) = ke Ee n(\y + rk — =|. 


Cov (tiv) = —ko?/mnd.w, 





358 CLYDE YOUNG KRAMER AND RALPH ALLAN BRADLEY 


Efficiencies of factorial contrasts may be obtained in the same way as E, 
and FH; in (4.11) and (4.12). The variances corresponding to (5.22) and (5.23) 
respectively for a randomized block design are 2c°/rn and 20°/rm, on the as- 
sumption again of equal experimental errors for the complete and incomplete 
block designs. The efficiency for contrasts among A-factor effects is 


(5.27) E, = dw/rk 

and the efficiency for contrasts among C-factor effects is 

(5.28) Ee = (yw + rk — r)/rk. 

The variance for an interaction contrast in the group divisible design is 
(A. + rk — r)o’/k 


from Table 1 and is o’/r for the randomized block design. Consequently, the 
efficiency for an AC-interaction contrast is also 


(5.29) Eac = (i + rk — r)/rk. 


Note that Ee = Euc = E,. The two-associate class group divisible designs 
have three subclasses as noted earlier. For the singular subclass, \; = r and 
Ec = Euc = 1; for the semi-regular subclass, \w = rk and E, = 1. In the 
next section we discuss individual comparisons and multifactor factorials. We 
now note, somewhat in advance, that all individual comparisons and sub- 
factor effects of the A-factor have the efficiency E, , those of the C-factor have 
efficiency EZ, , and those of the AC-interaction have efficiency E4c. 


6. Individual comparisons and multi-factor factorials. Individual or single- 
degree-of-freedom comparisons are possible in much the usual way. 

Let — be an (m — 1) by m orthogonal matrix and 7, an (n — 1) by n orthog- 
onal matrix used to transform the a’s and y’s respectively. Contrasts on A- 
factor effects would be 


(6.1) fH, Di Seats, u=1,---,(m—1), 


and on C-factor effects, 


(6.2) w= > NeiVi » p= Ps (n o 1). 


To test the hypothesis that £, = 0, we form the contrast 
(6.3) Tu = Din beds. = edo 5 Eautis/n 
and 
Adj. S.S. (Iu) = mdev(Doi Einds.)?/k Doi Ei. 
= Aw( Doi dos Eiutss)?/k Dos D5 Feu « 
Similarly, to test the hypothesis that », = 0, we form the contrast 


(6.5) J, = a eit. j - Didi Ne jtis/m 


(6.4) 





INTRA-BLOCK ANALYSIS 


and 
Adj. S.S. (Je) = mr + rk — r)(355 meft.s)?/k Dos nh; 

= (ar + rk — r)(Q0id05 meitis)*/B Di Qos tes - 
The contrast for interaction of & and m is 


(6.7) (En)ue = Doi dos Siu, ;- 


The hypothesis, (7)... = 0, is tested through. use of the contrast 


(6.8) (IJ )we = D545 Eiutestis 


and 


(6.6) 


Adj. S.S. (J) we = (ar + rk — 1) 


. (di oe tute itis) /k Dido (Eiutes) - 


Cochran’s theorem [11] is sufficient to demonstrate the independence of all 
adjusted sums of squares, Adj. S.S. (J), wu = 1, +--+, (m — 1), Adj. SS. (J,), 
v 1,---,(n — 1), and Adj. SS. IJ)u,u = 1,°--,(m— 1), = 1,---, 
(n — 1), and that they are appropriate for use in analysis of variance. Each 
has one degree of freedom and F-tests are effected using the error mean square 
of Table 1. 

Special definition of the matrices — and » permits the use of special contrasts. 
For example, rows of and » may be defined such that contrasts on A-factor 


(6.9) 


and C-factor effects measure trends (linear, quadratic, cubic, --- ) over the 
factor levels. 


Suppose the A-factor has levels which themselves are factorial combina- 
tions of other factors. Let there be p such factors, A;,--- , A,, with levels 
m,,°***, My. It is only required that m = [[2; m,;. Then & may be chosen 
in the obvious way so that the contrasts defined may be grouped to obtain 
main-effect and interaction comparisons for the subfactors of A. The correspond- 
ing adjusted sums of squares, each with one degree of freedom, may be grouped 
if desired to obtain Adj. 8.S. (A:) with (m, — 1) degrees of freedom, Adj. 8.8. 
(Az) with (m. — 1) degrees of freedom, Adj. 8.S. for interaction of A; and A>» 
with (m, — 1)(m. — 1) degrees of freedom, etc. Alternately, these sums of 
squares may be computed by forming the usual two-way, three-way, etc., 
tables of values of t;. and effecting the computation as though they were ob- 
servations in a single replication on factorial treatment combinations only 
finally multiplying the resulting sums of squares by the coefficient ndw/k of 
(6.4). Similarly the C-factor may consist of factorial combinations of q¢ factors, 
C,, +++, C,, with levels m, --- , ng such that [[4.1; = n and appropriate 
contrasts and adjusted sums of squares may be obtained with proper selection 
of the rows of ». When £ and » have been defined, the corresponding contrasts 
for interaction of A-factor and C-factor contrasts follow immediately. These 
in turn yield adjusted sums of squares that may be grouped to yield sums of 
squares for interaction of A; and C,, A;, Az, and C;, ete. 





360 CLYDE YOUNG KRAMER AND RALPH ALLAN BRADLEY 


Now we have shown how multi-factor factorials may be used in two-associate 
class group divisible designs. It is also evident that fractional factorials may be 
used. The levels of the A-factor may be designated to be m treatment combina- 
tions of a fractional factorial which is a fraction of a full factorial with, say, 
hm treatment combinations, h an integer. The levels of the A-factor would 
then form a (1/h)-th fraction of the full factorial. Similarly, the n levels of the 
C-factor might be a fraction of a second set of factorial treatment combina- 
tions. Analysis of the resulting fractional factorial experiment would again 
depend only on proper specification of — and 7. We would have r replications 
of a fractional factorial in the experiment. This may be a very useful system 
when it is necessary to use small incomplete blocks in a study. 


7. Remarks. We have shown how factorials may be incorporated in group 
divisible partially balanced incomplete block designs with two associate classes. 
The factorial treatment combinations were so matched with treatments in the 
rectangular association schemes for these designs as to yield quite simple analyses. 
Other correspondences between factorial treatments and the treatments of the 
basic designs may be possible, but we would expect that they would result in 
considerably more complex analyses and in lack of orthogonality among the 
factorial comparisons. The problem of the recovery of inter-block information 
is being considered. 

The group divisible subclass of the two-associate class of partially balanced 
incomplete block designs is only one of five subclasses given in [7]. The others 
listed are Simple, Triangular, Latin Square Type, and Cyclic, and comprise 
only a minor percentage of the designs listed in the reference. In particular, 
many designs of the Simple and Cyclic subclasses have values of v which are 
prime numbers and are not therefore suitable for factorials. Factorials have been 
developed in Simple and Triangular designs for special cases, but a general 
development has not been found. 

In the view of the authors, important applications of these factorial incom- 
plete block designs should be forthcoming. They should be useful in large ani- 
mal experimentation where litter sizes sharply limit the amounts of homogene- 
ous experimental material available. In taste testing, fatigue and other factors 
limit the number of samples that can be considered at a session, and these de- 
signs have applications there. In industrial experimentation, it may not be 
possible to make many observations while normal production is interrupted, 
and again use of incomplete blocks may be desirable. Some numerical examples 
on the uses of these designs are being prepared for an applied paper [12]. 

Marvin Zelen [13] did some preliminary work on the use of factorials in in- 
complete block designs (and subsequently has obtained additional results in- 
dependent of us.) While the formulation and presentation given here are our own, 
we wish to acknowledge his cooperation through helpful discussions when this 
research was initiated. 





INTRA-BLOCK ANALYSIS 


REFERENCES 


{1] R. C. Bose, anp W. S. Connor, “‘Combinatorial properties of group divisible incom- 
plete block designs,” Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 

[2] R. C. Boss, 8. S. SorrkHanpe, AND K. N. Buatracuarya, “On the construction of 
group divisible incomplete block designs,’’ Ann. Math. Stat., Vol. 24 (1953), pp 
167-195. 

[3] Boss, R. C. anp Surmamoro, T., ‘‘Classification and analysis of partially balanced 
designs with two associate classes,”’ J. Amer. Stat. Assn., Vol. 47 (1952), pp. 151- 
190. 

[4] Oscan KemprTuorne, The Design and Analysis of Experiments, John Wiley and Sons, 
Inc., 1952. 

[5] C. Y. Kramer, anp R. A. Brap.ey, ‘“‘k(k — q) factorial treatments in near balance 
incomplete block designs,’”’ Progress Report No. 3 on Statistical Techniques for 
Sensory Testing of Processed Foods, Contract Report with U. 8. D. A. 

(6) Boryp Harsuparcer, “The 2* Factorial in a Latinized Rectangular Lattice Design,’’ 
Technical Report No. 1 (1954) to Office of Ordnance Research, OOR Project No. 1166 

[7] R. C. Bose, W. H. CLatworrnay, anv 8.8. SartkHanpeE, “Tables of Partially Balanced 
Designs with Two Associate Classes,’”’ Tech. Bul. No. 107 (1954), North Carolina 
Agricultural Experiment Station. 

[8] Boyp Harsuparcer, “Near Balance Rectangular Lattices,’’ Virginia Jour. of Sci., 
Vol. II (mew series) (1951), pp. 13-27. 

(9] Boryp Harsuparcer, “Latinized Rectangular Lattices,’’ Biometrics, Vol. 8 (1952), 
pp. 73-84. 

[10] S. S. Witxs, Mathematical Statistics, Princeton University Press, 1950. 

[11] W. G. Cocuran, ‘‘The distribution of quadratic forms in a normal system with appli- 
cations to the analysis of covariance,’’ Proc. Cambridge Philos. Soc., Vol. 30 
(1934), pp. 178-191. 

[12] C. Y. Kramer, anv R. A. Brap.ey, ‘Examples of intra-block analysis for factorials in 
two-associate class group divisible designs’’ (in preparation). 

[13] Marvin ZELEN, ‘‘The use of incomplete block designs for factorial experiments” (un- 
published manuscript presented to Amer. Stat. Assn. at Montreal, Canada, Sept. 
9-10, 1954). 





STATISTICAL PROPERTIES OF INVERSE GAUSSIAN 
DISTRIBUTIONS. I. 


By M. C. K. Tweepre! 
Virginia Agricultural Experiment Station, Virginia Polytechnic Institute 


0. Summary. A report is presented on some statistical properties of the 
family of probability density functions 


exp [—A(x — pw)*/Qya)[\/2ea°}” 


for a variate x and parameters » and X, with 2, uw, A each confined to (0, ~). 
The expectation of x is u, while \ is a measure of relative precision. The chief 
result is that the ml estimators of » and \ have stochastically independent 
distributions, and are of a nature which permits of the construction of an ana- 
logue of the analysis of variance for nested classifications. The ml estimator 
of u is the sample mean, and for a fixed sample size n its distribution is of the 
same family as x, with the same » but with \ replaced by An. The distribution 
of the ml estimator of the reciprocal of \ is of the chi-square type. The prob- 
ability distribution of 1/z, and the estimation of certain functions of the param- 
eters in heterogeneous data, are also considered. 


1. Introduction. The name ‘Inverse Gaussian” has been suggested [1] for 
the members of a certain family of continuous probability density functions in 
which the variate takes positive values only. The family is generated by vary- 
ing the values of two real positive parameters, which may be any independent 
pair from a, \, wu, ¢, Where $a° = » = A/¢. The density function for the posi- 
tive values of the variate may accordingly be written in the forms 


, 1/2 3y1/2 
(la) fi(x; a, X) = exp {—adaz + A(2a)"” — A/2zx}[A/2a2")”. 


(1b) fe(x; wu, A) = exp {—A(a — p)?/2Qu7x}[A/2ax*]?, 
(le) f(x; u, d) exp ¢ -= + 
L ue 


‘ ox | 31/2 
(1d) fa(a; $, d) exp { — > + @ — [\/2rx"] 
\ me - y 
Each of these forms is convenient or suggestive for some purpose. 
The relationships 


(2) folx; wu, +) = w fa(z/p; 1, ¢) = Nfi(x/d; 4, 1) 


Received April 9, 1956; revised July 3, 1956. 
1 Present address: Statistical Laboratory, Department of Mathematics, The University, 
Manchester, England 


362 





INVERSE GAUSSIAN DISTRIBUTIONS. 1. 363 


are useful in computing numerical values of the probability density. The cumu- 
lative probability function depends essentially on only two variables, which 
might be chosen to be x/u and ¢. The case » = 1 could therefore be adopted 
as a standard form. Curves of the density functions for \ = @ = }, 4, 1, 2, 4, 
8, 16, 32, with » = 1, are shown in Fig. 1. In some physical applications it is 
more natural to hold \ constant, and Fig. 2 shows the density curves for \ = 1 
with » = 4, 1, and j, ie., for @ = 3, 1, and 4 respectively. 

Since it will be found useful to consider also some functions of the same alge- 
braic form but with complex values for some of the parameters, it may be noted 
that the integrals of functions such as (1) over the interval (0, «) can be shown 
to be unity, provided that the real parts of \ and of the mutually equal quan- 
tities ad and }\y” are positive. For reference we reproduce an equation for a 
modified Bessel function of the second kind, 


(3) K.,(z) = (2) [ exp, -t - 2 
0 \ 


given by Watson ((2], p. 183), under the condition that the real part of 2° is 
positive, together with the result 
(4) K4,; (z) =e (x 2s)". 


also given by Watson ({2], p. 80). 


wt 81/4 


| \\\ 


/ | horn? \ \ 
‘st i \ 
oe 


\\ 
\ 





3.0 
E (x) 


Fig. 1. Probability density curves for an Inverse Gaussian variate with u = 1 for 8 
values of A or ¢. 





364 M. C. K. TWEEDIE 


The Inverse Gaussian family of distributions arises in a problem of Brownian 
motion (cf. [1], [3]), though then a further parameter appears in the physical 
formulation. The numerical value of this parameter can however normally be 
regarded as known, and it merely modifies the values of the parameters given 
in the expressions (1) above. Both \ and y are of the same physical dimensions 
as the random variable z itself. A change of scale of z, such as may be due to 
a change in measuring unit or, approximately, to changes in temperature or 
some other factor, produces another member of the family, in which \ and u 
have been multiplied by the same factor as x. The ratio ¢ is invariant under 
such a chaage. 


45 





0 
0.5 1.0 1.5 2.0 2.5 


Fic. 2. Probability density curves for an Inverse Gaussian variate with \ = 1 for 3 
values of pu or ¢. 





INVERSE GAUSSIAN DISTRIBUTIONS. I. 365 


The same family appears also as a limiting form for the distribution of the 
final sample size in a special case of Wald’s sequential likelihood ratio test [4]. 
Some properties of this family were studied in a degree thesis [5], where the 
Brownian motion problem was found to have an important part in the inter- 
pretation of some experimental work. The present paper establishes some of 
those exact properties in a more formal way, though using essentially the same 
methods as in the thesis. Some new results are included, and some further ones 
will be given in another paper. Not all these results are of quantitative im- 
portance in the original physical problem, and those which are not are pre- 
sented here for their theoretical interest. The formulae (1) will be regarded as 
given, in that no derivations will be offered here. The uniqueness of certain 
Laplace transforms will be an important factor in some of the proofs. The 
form (la) is of the kind adopted in an earlier published paper [6], in which 
similar methods were used. 


2. Basic characteristics. 


The shape of the distribution depends on ¢ only. The distribution is uni- 
modal, with its mode at 


9 1/2 
Zmode = u{(1 + =) — z\. 


The ratio Zmode/p converges to 1 when ¢ is increased to infinity; while the ratio 
Imode/A Converges to 4 when @¢ decreases to zero. The density at the mode is 
least when ¢ = 2, if uw is fixed. The mode then occurs at t = 4u and the den- 
sity there is [8/xy’e]"” = 0.96788y"". 

It is convenient to introduce the logarithm of the Laplace transform E(e~*) 
of the probability density of the variate, which is in a sense a cumulant-gener- 
ating function (cgf). We denote the relevant function operator by L, with the 
variate symbol as a subscript and the other variables in parentheses. Thus, 
from form (la), 


(6) L(t; u,) = ne?” et MeN a1) /Qaa*}” 
0 


~ sa U2 yol/2 — P : t 
(7) = (2a) AW Lat + In fil 2; a+ =-,d) dz. 
r Jo r 


If ¢ is imaginary, or, if real or complex, if its real part exceeds —ad, the in- 
tegral in (7) is unity. Hence 


f 1/2 
(8a) L(t; », >) = A { (2a)"” — 2" (« + ‘) \ 


x 
2,\ U2 
(8b) = 3 ~ (1 +7") \ 
by r 
Qut 1/2 
o{1- (143) } 


/ 





366 M. C. K. TWEEDIE 


(8d) a gti = (1 + =) \. 
l $e 


} 


This egf. is unique to the density function (1). 
The cumulants can be obtained from the power series expansion of L,(t; pu, d). 
They are: 


3 


K=paep, i=prd =N¢, 


5 


9, &\-—2 on. 8 
k= 3nA = 3A¢ 
and, in general, when r = 2, 


xa = 15p"X~* = 15ar‘o™’, 


. kp = 1:3-5 +--+ (2Qr — 3)" 
(9) ’ 
= N(2r — 3)\/p" 2 %(r — 2)1. 


Thus yz is the population mean and is primarily a measure of location, while \ 
is an inverse measure of relative dispersion, being the ratio of x; to x2 , or 


(10) 


Also, @ = «i/x2. The Fisherian shape coefficients, or standardized cumulants, are 


= Lid i 
—r/2—1 


Yr = Kr+2ke = 1-3-5 e9* (2r + lo”. 


—3/2 


Yi = KxKe = 36 


—1/2 —2 
Y2 = Kako 


(11) 


. : = on . —1 —1 1/2 1/2 
The fractional coefficient of variation is y-1 = x; ke = @  , so that y, = 


37-1, Y2 = 157-1, and so on. Evidently the distribution becomes more and 
more nearly normal when ¢ is increased. This parameter ¢ might be called the 
normality parameter or the shape parameter. 

In the probability density curves shown in Fig. 1, 7: ranges from 6 down to 
0.53, and y2 ranges from 60 down to 0.47. The approach to normality in the 
neighbourhood of x = yu is evident from these curves. However, some important 
aspects of the distributions, such as the standardized cumulants, depend pri- 
marily on the behaviour of the functions at very large values of the variate, 
whereas the diagrams are necessarily bounded. 

The positive integral moments about zero are obtainable either from (9) or 
by direct integration, using (3) and a further result given by Watson (([2], Eq. 
(12), p. 80). They are: 


, 2 3, —1 
= i; Mo=e tyra , 


1 


w+ 3u'r* + Ber, 


=p + Gun + 15p'N” + 15y'd™, 


r—1 


ur = w'K,-12(¢)Kia(¢) = vw’ DS — 


0 Si(T 


o— i + ai 


=T= a1eey 


The negative integral moments are given in (33). 





INVERSE GAUSSIAN DISTRIBUTIONS. I. 367 


It follows from the form of (8) that the distribution of the arithmetic mean 
of a fixed number n of independent values from (1) is a member of the same 
family, with the same a and yp, but with » replaced by An and ¢ replaced by 
on. More generally, suppose that we have a set of populations in which yu; and 
\; are the values of the parameters in the i-th population, and that, although 
the values of these parameters are unknown, the values of a; = Cy; A; are 
known, C being a constant whose value is not necessarily known. The distribu- 

of the linear function > im (a,z,;) is then of the same form as (1), with 
uw = CLii-1¢:, = C(kig,)’, ¢ = D2-1¢;. Because of this additive prop- 
erty of the normality parameter, the linear function will have a more nearly 
normal distribution than any of its components. 


3. Estimation of parameters. Suppose that z; is an observation on a dis- 
tribution of the form (1b), with parameter values w and \;, where A; = Aw; 
fori = 1 to N, w; being positive and known, but neither of the common values 
of » and A» being known. For example, x; might be the arithmetic mean of w; 
values from a distribution with parameter values u and A». With these N pairs of 
values of x; and w; as data, the estimates of u and X» which jointly maximize the 
likelihood function are given by 


N 


(13) j=s,=) (wad / (wi), 


i-1 


1 ic 1 1 
a4 1d w(2- 2). 
(14) ho Ve” ( x. 


These estimates can never be negative so long as the observations are necessarily 
non-negative. For (14) this follows from the concavity of the function x. With 
every w; equal to unity, these estimates were given by Schrédinger [3], who called 
them “wahrscheinlichste.”’ 

The Inverse Gaussian family is one for which the weighted sample mean xz, = 
& (13) is a sufficient statistic (in Fisher’s sense) for estimating the common 
population mean uw. Further, the cumulant-generating function of ~, with fixed 
values of u, A», N and the weights w,, --- wy, differs from (8) only in that 
\ becomes o> 1-1 W:. (To see this, take C = u’/do>-im1 w; in the result at 
the end of Section 2.) The probability density function of 4 therefore is 


(15) Fol; ow, Nod_emt wi), O0<f< ~. 


In the terminology of a previous paper [6], the family (1) is a Laplacian one 
with @ as primary parameter and \ as secondary parameter. Hence f has a La- 
placian form of probability density function. This enables the conditional 
moments and cumulants and other properties of other statistics, with a fixed 
value of 4, to be found by using the uniqueness of the Laplace transforms which 
appear in their mathematical formulations. A number of exact results have 
been found for the Inverse Gaussian distributions in this way, and we shall 
now proceed to develop one of the more surprising of them. 





368 M. C. K. TWEEDIE 


4. Distribution of the ml estimator of the secondary parameter. With the 
same data and the same fixed quantities as were introduced in Section 3 in dis- 
cussing the distribution of the maximum likelihood (ml) estimator of yu, the 
Laplace transform of the probability density function of 1/ho is 


(16) E(e7**e) = / tee [ eo TT flXi; w, dows) AX;. 


t=1 
all X;>0 


This certainly exists when the real part of ¢ is not negative. On substituting from 
(1) and (15) and writing }07-1 w; = W for brevity, we get 


sie scl PP / . ‘ 
E(e*"* | wu, Xo, Wi, «++ Wy, N) = | fal; u,oW) [-- [e ti 
imo J 
(17) » constant 


3/2 N-1)/2 WN 1/2 , 
p (3) " WW, dx; 


We \ae) | x 


The multiple integral in the final integrand on the right of (17) does not contain 
u or a. From the Laplacian form of f2(f; u, AoW) and the uniqueness of the La- 
place transforms to which it gives rise, it follows (cf. [6]) that the partial deriva- 
tive, with respect to 4, of this multiple integral is equal to the Laplace transform 
of the conditional density of 1/X.. This statement may be justified by reference 
either to Lerch’s theorem ({7], p. 52; [8], p. 61) or to an equally applicable set of 
theorems (cf. [9], p. 38). The proof can legitimately involve a taking complex 
values with positive real parts. Therefore 


a [. Le [ cePormtegsngy 9/25)" T] (X72?w? dX,) 
(18) Ou i=1 


w constant 


+ 
, a » a T 
= Ele ‘/ °| fi; Ao, Mi, oa 6 wy, N), 


which is the conditional moment-generating function of 1/\o with f fixed. To 
evaluate this integral, first take ¢ = 0, which gives 


9 f pits N sg wie Qn (N—1)/2 
19) = | ‘ms | g terion II (x7°" dX) = = x 2 
. p constant ~ II (w; ) ’ 

i=1 


By substituting A>» + 2t/N for > on both sides of (19), the left-hand side of 
(18) is found almost immediately to be 


P 2t (N—1)/2 
(20) E(e""* | Bj Ro, ti, +++ Ww, N) = (1 ¥ a) 
0 


This is a Laplace transform of a density function of the chi-square type, with 
N-1 degrees of freedom. In fact, it shows that 


2 
X (w-1 4.6.) 


: 1 
(21) ~ = 
do oN 





INVERSE GAUSSIAN DISTRIBUTIONS. I. 


and thus that 


N 2 
(22 Dw; (: ~ =) = ESSE), 
il &  £. Xo 
This result can be used to obtain confidence intervals for \>. By substituting 
for the probability elements of @ and i» in the joint probability element of the N 
observations, it can be shown also that 4 and \p» are jointly sufficient estimators 
of » and Xd». Further, 


is an unbiased estimator of 1/Xo . Its distribution is of exactly the same type as 
that of the usual unbiased quadratic estimator of the variance of a Gaussian 
distribution, although it cannot be expressed precisely as a sum of squares of 
Gaussian variates with zero means. The conditional distribution of (23) is neces- 
sarily independent of u, because of the sufficiency of 4, but it is also independent 
of 4, thus affording the possibility of an analogue of the analysis of variance, 
using the existing tables of x’ and F for significance tests and so forth. In the 
Brownian motion problem yu and X are concerned with rather different physical 
properties of the experimental system, which do however occur together in some 
physical formulae. It is therefore convenient that estimators have been found 
which are both independently distributed and jointly sufficient. 

The statistic (14) appears also in the likelihood ratio test of the hypothesis 
that the population means are equal against the alternative hypothesis that they 
may have any values independently of one another, the values for the means 
not being specified, while the value of the secondary parameter is supposed known. 
The logarithm of the ratio of the maximum likelihoods under these two hypothe- 
ses is —AoN/2ho , so that the result (21) is a case where the well-known approxi- 
mate general result ((9], p. 151) holds exactly. Moreover, the statistic \» de- 
pends essentially on the difference between the ml estimator of the reciprocal of 
the hypothetically common value for » and the ml estimator of the weighted 
mean of the reciprocals of the means under the assumption of their complete in- 
dependence. Both these considerations indicate that ho will tend to be increased 
by real differences between the population means, and that it therefore measures 
the combined effect of the dispersion in homogeneous samples and the hetero- 
geneity of the means. 


5. An analogue of the analysis of variance for nested classifications. The 
algebraic aspect of the analysis of variance for nested classifications may be 
generalized, for two classifications (which is a sufficiently general case), to 


yo > we {y(ai3) — v(z..)} 
(24) 


= rer >t (Y(aas) — V(2i.)} + Oe nifv(as.) — v(z..)}. 





370 M. C. K. TWEEDIE 


Here «x;; is the observed value of the variate in the j-th subclass of the 7-th major 
class, and n; is the number of subclasses in this major class, while there are N 
major classes. The values of z;, and x,, are means of some kind of the values of 
x;;, and ¥(z) is some suitable function of x. The sums might be regarded as de- 
pending on the differences between different kinds of means, in that they could be 
rewritten as 


\ 
né 


+= 7 - W253) — v(x..)} 


. t=1 j=l 
(25) ns 
N Zz (2; ;) 1 N 
an, | =——  — wai.) + m9 DY [niv(ai)] — W(x..)>, 


i=1 \ N; , i=] 


1 
n 


where n, = >>*, n;. That is to say, if we temporarily use M to stand for the 
operation of taking the relevant mean involved in z;. and z.,, and A to stand 
for the operation of taking the weighted arithmetic mean, the identity (25) can 
be written 


(26) A:Aw — WM.M; = AAW — ¥M;) + (A — WM))M;, 


operating on 2;; . Certain restrictions on y and M are necessary to ensure that 
these differences, which are essentially measures of dispersion, shall never 
change sign. In the analysis of variance, the means entailed by M are arithmet- 
ical, while ¥(x) = x’. If the variates have independent Gaussian distributions 
with both means and variances equal within any major group, the two major 
sums on the right side of (24) have independent distributions. This independence 
does not generally occur in other circumstances, but it is available to some ex- 
tent with the Inverse Gaussian distribution. For this, according to Section 4, 
we again take the means M to be arithmetical, but take ¥(x) = x 
From the results obtained in Section 4, we see that the statistic 


(27) = (= ok =) = of 
7=1 Vij Li, rX: 


is distributed as x°/A; with n; — 1 degrees of freedom and independently of 


i, = > (x;;)/n; . Hence 
N ni l l \ 
~ 2 (5 -=) 
ti=1 j=1 ij Vi. 


is distributed as x°/A with n, — N degrees of freedom and independently of the 
values of z;.. (This distribution would remain true even if the expectations 
E(z;.) varied with 7.) In particular, if x. = im (n,wv;.)/n;, that double sum 
is distributed independently of 

:) 


which is itself distributed as x’°/A with (NV — 1) degrees of freedom. 





INVERSE GAUSSIAN DISTRIBUTIONS. I. 
The algebraic identity (24) thus becomes 


~) pd(t-+)-Ex(L-+)4+¥n(2 -2). 


i=] jl \Z; i=l j i=] 


If all the observations come independently from the same Inverse Gaussian dis- 
tribution, the three major sums in (28) are each distributed as x’/A, the chi- 
squares having respectively n. — 1, n.— N, and N — 1 degrees of freedom. 
Thus 1/A can be estimated by dividing any of the three sums in (28) by the cor- 
responding number of degrees of freedom. The two sums on the right of (28) 
have independent distributions and therefore their ratio is distributed as 


(N — 1)F/(n. — N), 


where F has N — 1 and n, — N degrees of freedom. Hence the analogy with 
the analysis of variance is very close. For example, a significance test of the dif- 
ferences between the N values of x;. may be made by using the first major sum 
on the right of (28) as the analogue of the sum of squares for error. Some illus- 
trations of the use of these formulae will be published separately for some elec- 
trophoretic data on individual colloid particles and for some purely empirical 
trials on more general data. 

This “analysis of reciprocals’? by (28) is invariant under changes of scale of 
the observations, but not under more general linear transformations, whereas 
the analysis of variance is thus invariant. It should also be noted that the ob- 
vious parallel with the algebraic identity for the main effects and interactions in 
the analysis of variance for crossed classifications does not give independent 
components. An interaction term, such as 

Sd bow Be 
Ss. a. 


i=] j=] Ui; zt 


in a commonly used notation, does not have a distribution of the chi-square type, 
since it has a finite probability of taking a negative value, and therefore this 
analogue of the analysis of variance is restricted to nested classifications. 


6. Distribution of the reciprocal of an Inverse Gaussian variate. For some 
purposes it is convenient to work with the reciprocal of the Inverse Gaussian 
variate x, which will be denoted by y. For example, the analysis discussed in 
Section 5 can be expressed simply in terms of this variable. The weighted arith- 
metic means x;., z,, of the values of 2,;; are replaced by their reciprocals, which 
are the weighted harmonic means §;. , §.. of the values of y;;. The analysis of 
the values of y;;, corresponding to the algebraic identity (24) or (26), with 
¥(y) = y and M the harmonic mean, thus becomes 


ie (ysis — §..) 
= ee Ds (yes — Gi.) + CE ng. — G..). 


(29) 





M. C. K. TWEEDIE 


These sums of course have the same chi-square types of distribution as the ex- 
pressions in terms of 2;;, z;., and z.,, to which they are equal. However, this 
analysis (29) is sufficiently easy to compute to be considered as a further prac- 
tical analogue of the analysis of variance for certain purposes. 

Some of the useful properties of the variate y follow in an obvious way from 
those of x, hardly justifying giving any special consideration to the family of dis- 
tributions of y. However, the latter has some interesting features and a short 
account is therefore in order. Some of the results will be expressed in terms of 
x, since that variate is the primary object of this study. 

The probability density function of y may be written 

Ay 


( 
(30) Fs + : — zx [\/2Qary}' a 


(31) = wyfely;u, w) = wyfs(uy; 1,¢) = u’yfa(uy; >, 9). 


The mode is at 


1 2) 1/2) 1 f s , 1\ 
mete oe —1 ; a . =: gee? o 
Ynose = a [1 + (1 + 49°)""] = = a 


The probability density at the mode is 


u{[L + (1 + 4¢°)'"]/4x}"” exp [o — @ + 4)""}, 


1/2 


which approaches yu/(27e) 
zero with a fixed value of xu. 

Fig. 3 shows some examples of (30) plotted for \ = 0, 3, 1, 4, 16, 32, with 
u = 1, for 0 < y < 3, corresponding to 4 < x < «. The difference between 
Figs. 1 and 3 for small values of \ is rather striking. 

Fig. 4 shows density curves from (30) in a form comparable with Fig. 2, having 
\ = 1 with » = }, 1, 4 for 0 < y < 7. Thus the harmonic sample mean is a 
sufficient statistic for discriminating between the distributions of the family to 
which the curves in Fig. 4 belong, while the arithmetic mean is the correspond- 
ing statistic for Fig. 2 (ef. Section 3). Some consequences of using the arithmetic 
mean of observed values of y to estimate 1/yu instead of using the harmonic mean 
are discussed in Section 7. 

The moments about zero of y, which are the moments about zero of negative 
order for the Inverse Gaussian variate x, may be found by direct integration, 
using (3), or from other results. They may be found from those of x of positive 
order by using the relationship 


E{(x/u)™] = El(z/n)"™", 


from which, in a notation applying to the variate <, 


= 0.241971 as its limit when ¢ or \ decreases to 


(32) : 


/ 2r+l 
a on Mr4i/M . 


Thus the moments of all positive and negative orders exist for an Inverse Gaus- 
sian variate (and for its reciprocal), in contrast to the situation with some other 





INVERSE GAUSSIAN DISTRIBUTIONS. I. 


Xe 32 


= 1) 


DENSITY ( j4 
wo 


° 


PROBABILITY 


1.0 
LIMIT Ely) 


Fic. 3. Probability density curves for the reciprocal of an Inverse Gaussian variate 
with » = 1 for 6 values of X or ¢. 


superficially similar distributions, such as the chi-square type. For reference, 
we have 

pi =p +2", 

pea = + By A + BN, 

ws = w+ Gu A + Se A + 1EX™, 

peg = w+ 10 + 45u ON + 1050 + «1050, 


pir = (2d) > (r= 8)! (ogy 


=o si(r — 8s)! 


r ! 
& +91 (ayy. 
=o si(r — s)! 

The family (30) is of the Laplacian form as regards the variate y. Thus its 
cgf. can be found by a process of substituting alternative values for the parame- 
ters in the integral of (30), in a similar way to that used for deriving the cor- 
responding function (8) for x. The result is that the logarithm of the Laplace 
transform of the density function of y is 


(34) L,(t; @, s) = (1 — (1 + 2a7)") — Hn(1 + 2a). 





37 M. C. K. TWEEDIE 


It is curious that the form of this function (34) shows that the distribution of y 
is the same as that of the convolution or composition of an Inverse Gaussian 
distribution (with the same value of @ but with uw replaced by 1/u) with an in- 
dependent distribution of x’/d, this x° having one degree of freedom. 

The first two cumulants of y are 


pe Se ety se 

Ki(y) wy ky) = oo 
Thus \* is the bias in using y (or 2’) as an estimator of uw. The variate y 
itself may be the harmonic sample mean of values of a similar variate, or the 
reciprocal of the weighted arithmetic sample mean of values of Inverse Gaussian 
variates as in Section 3. The mean squared error in using y as an estimator of 


—j - 


u 1s 


(36) E{(y — wy) = @+3)d 


(az1) 


> 
be 
wn” 
2 
WwW 
a 





PROBABILITY 





|| 
o + , 


Fig. 4. Probability density curves for the reciprocal of an Inverse Gaussian variate 
with X = 1 for 3 values of u or ¢. 





INVERSE GAUSSIAN DISTRIBUTIONS. I. 


The first two Fisherian shape coefficients of y are 
nly) = (36 + 8)(@ + 2)”, 
(38 yo(y) = 3(56 + 16)(@ + 2) 
The fractional coefficient of variation is 
(39 y-r(y)’ = (@ + 1)" + 2)"” 


The values of y=i, 7: , and y2 are smaller than the values of the corresponding 
characteristics of z. When ¢ is increased, both these and the shape coefficients of 
higher order approach zero as their limit, so that the distribution of y then ap- 
proaches normality. 


7. Estimation of the arithmetic mean reciprocal of expectations of Inverse 
Gaussian variates. In the physical experiments which led to this research, it 
was desired to estimate the arithmetical mean reciprocal of the population 
means of four distributions which might reasonably be treated as Inverse Gaus- 
sian. The four means were not individually of special interest, their inequality 
being due to an artifact of the measuring technique. The secondary parameter 
\ could be considered to be constant in any one experiment. This estimation 
problem may be discussed in the following more general terms: 

Suppose that we have N populations of the kind (30), the 7-th having parame- 
ter values u;, 4; = Aow;, and one observation y; from each population. Write 


j = YL, (wy) /D-% (w,). It follows from the form of (34) that the distribu- 


tion of 7 is the same as that of the simple arithmetic mean of N values of y taken 
from one distribution whose parameter values are 


u* = nm (w;) >= (w;/mi), A\* = > (w,)/N. 


We may also write ¢* = \*/u*. The distribution of 7 is the convolution of an 
Inverse Gaussian distribution, whose parameters in the form (1b) have the values 
1/u*, \*N, with an independent distribution of x’/A*N, this chi-square having 
N degrees of freedom. Because this belongs to a different Laplacian family it 
will not be studied in detail here. The results needed at present can be obtained 
from those already found. 


From (35), 


7 


10 E(g) = (1 + 1/¢*)/u*, 


(41) E\(g — 1/u*)"] = @* + 2+ N)/PN. 
If the N values of u; were all equal, the harmonic sample mean 


gy _ > 1 (w;) > ent (Wi Yi; 





376 M. C. K. TWEEDIE 


would be a more precise estimator of the common value. The formula (36) would 
then give 


(42) E\(g — 1/u)’] = @* + 3/N)/*"N, 


which is less than (41) except in the trivial case of N = 1. The efficiency of 9, 
in these circumstances, can be measured by the ratio of the mean squared errors 
(42) and (41), or by the ratio, to N, of the modified value for NV which needs to 
be substituted (without changing \* and ¢*) in (42) to give a mean squared 
error for # equal to (41). The former measure of efficiency is easier to calculate, 
but is slightly less than the latter. However, the difference is less than one per 
cent if ¢* > 18.7. 

Reverting to the estimation of u* when the values of 4; may be unequal, we 
may attempt to improve the estimator 7 by adjusting it for the bias. When the 
N values of y; are the harmonic sample means of the reciprocals of Inverse 
Gaussian variates, or the reciprocals of arithmetic sample means of Inverse 
Gaussian variates, separate estimates of the values of \; can be obtained from the 
variation exhibited within the samples, by using the appropriate form of (27). 
If the i-th sample contains n; observations, 


ME = X (yi at. /ANW 


this distribution being independent of y;. On weighting these estimators suit- 
ably and on writing nt (n; — 1) = D, we have, as an unbiased estimator of 
—1 - . . . 

Xo of minimum variance, 


9 


ee (z'nav,)/D = xin a.t5/doD. 


Thus, for an unbiased estimator of 1/u*, we get 


f « , N » & - ™) 
(43) y = > oes wy; — niN/rX;D Ds 1 (W;). 


Then 
(44) a((3 /p*)] = 2+ 2N/D)/r\"N. 


This mean squared error will always be less than that (41) of 7 if D > 2, which 
will be true of most experiments. However, unless ¢* is close to or less than unity, 
which seems unlikely to occur, the statistical superiority of y’ over 7 is of very 


minor importance. 


8. Acknowledgments. Much of this work was done under a Senior Scholar- 
ship at the University of Reading, England, followed by financial assistance 
from the British Empire Cancer Campaign. The recent developments were 
made under the sponsorship of the National Science Foundation. The figures 
were drawn by Mrs. D. Hamilton from tables of values which were computed 
by Mr. G. Zorbalas. 





INVERSE GAUSSIAN DISTRIBUTIONS. I. 


REFERENCES 

| M. C. K. Tweepre, “Inverse statistical variates,’’ Nature (London), Vol. 155 (1945), 
p. 453. 

G. N. Watson, A Treatise on the Theory of Bessel Functions, 2nd ed., Cambridge Uni- 
versity Press, 1944. 

3] E. Scurépincer, ‘Zur Theorie der Fall- und Steigversuche an Teilchen mit Brown- 
scher Bewegung’’, Physikalische Zeitschrift, Vol. 16 (1915), pp. 289-295. 

A. Wap, “On cumulative sums of random variables’’, Ann. Math. Stat., Vol. 15 (1944), 
pp. 283-296. 

M. C. K. Tweepre, ‘‘A mathematical investigation of some electrophoretic measure- 
ments on colloids,’’ unpublished thesis for M.Sc. degree, University of Read- 
ing, England, 1941. 

M. C. K. Tweepig, ‘“‘Functions of a statistical variate with given means, with special 
reference to Laplacian distributions,’’ Proc. Cambridge Philos. Soc., Vol. 43 
(1947), pp. 41-49. - 

E. J. Scorr, Transform Calculus with an Introduction to Complex Variables, Harper, 
New York, 1955. 

D. V. Wipper, The Laplace Transform, Princeton University Press, 1941. 

8S. S. Wiiks, Mathematical Statistics, Princeton University Press, 1950. 





VARIANCES OF VARIANCE COMPONENTS: III. THIRD 
MOMENTS IN A BALANCED SINGLE 
CLASSIFICATION! 


By Joun W. TuKrEy 


Princeton University 


1. Summary. The methods used in earlier papers of this series [2], [3] are 
extended from variances to third moments, and applied to the third moment 
about the mean (= third cumulant) of the usual estimate of the between vari- 
ance component in a balanced single classification. The result is moderately 
complex, but manageable. 


2. Introduction. The technique of this paper grows directly out of those used 
for variances in the earlier papers in this series [2], [3]. We assume familiarity 
with the terminology and notations used there. 

We begin by discussing the third moment of the variance of a sample as an 
illustration of problems and technique, and then pass on directly to the main 
problem. We need the multiplication formulas (see Wishart [4}) 


4(n — 2) 
—-—-—-—-- ks ° 
‘+ n(n — 1)? 
’ 1 


9 ‘ 
———— Ker —— k } LT 
ps Mas Fae © 4 ap 


— 
(N — 2), 


ke ke 


2 key + N a kgs 

3. The variance. We can now proceed to write down the third moment about 
the mean (that is the third cumulant) of the variance of a sample drawn from 
a finite population. There are various ways to do this, but we shall begin with 
one resembling the first method we used [1] to get the variance of the variances 
(we refer to using primes for population quantities for three sections): 


ave {ke — kz}* = ave {kg — 3keks + 3k2(k2)” — (k2)*} 


’ ite 2 ‘ 1\3 
= ave {k3} — 3k: ave {ko} + 2(k2) 


/ , , l / 
kek, = ko + N ke + V 


Received April 6, 1955. 

! Prepared in connection with research sponsored by the Office of Naval Research and 
based on part of Memorandum Report 45, “Finite Sampling Simplified,’ Statistical Re- 
search Group, Princeton University, which was written while the author was a Fellow of 
the John Simon Guggenheim Foundation. 


378 





VARIANCE COMPONENTS 


( oni ; 
< 7 6 
= ave< Koo + 3(n + 3) hes + ne be 
n(n — 1) in — 1)* 


yt 4(n — 2) 


——_—_—— k 
n* n(n — 1)? = 


gi i 5 te 1 2 t 
_ 3ke ave 4 koe + > kg + uae kn} = 2k’ 


’ 3(n 3) 6n a 2 , l A(n —_ 2) 
= Koss geovemmnes Bess 4 — fi ——_—_—... kas 
mT n(n — 1) uta _ jye Raw + he ‘i S 1)? ks 


9 . 

ar. 2? ° 4,7 6 ty? . ‘\3 

— 3ke kes _— = kok, v2 22 _ 2(ke) 
n n—1 


3(n + 3) 6n + 2 1 4(n — 2) 
n(n — 1) ta + (n — 1)? kame + ake + “nin — 1) ka 


, 6 | 12 6 
— 3kin — 5 kn — 3 — kee + TiN 


3 a ee pero VED apo. 
a" W—-tn W-itn 


24 ” 12 
q-Djw-p Cicada, Rae 


6N 2 ea , 
2 = Yap kane + Ske + 5 7 N — 1)? ks3. 


>) 


a 18 
+ 3; > ks +4 ln(fn—-1) M—I1)N + VW — 


4 12 8 
lam — 1) G—iDNW-D ’ NW 1+ | 
f 8 24 
TIG—-IY &-DW-D' Wp 
f 12 12 24 24 
Deena ceca, SMT cca RAS aint et hacis hee ae 
Inn —1) (W—DN (N—In + N(N — 1) ku, 





which vanishes for n = N, just as it should. 

Now it is even clearer for the third cumulant (which might perhaps be called 
the skewmulant) than it was for the variance, that direct calculation would be 
long, tedious, boring, and, because of its length, likely to be erroneous for any 
component in an analysis of variance situation. We must go to more ingenious 
methods. 





380 JOHN W. TUKEY 


4. Structure. We notice that the third cumulant of any algebraic function of 
a random sample takes the form 


ave {u‘} — 3 ave {u} ave {u'} + 2(ave {u})* 
= a(n) — 388(n)y(n) + 28, 
where a(n), 8(n), and y(n) depend on n alone, and 6 is independent of n and 


N, all four being expressable linearly in polykays. The only appearance of N 
comes from the multiplication formulas involved in 


B(n) -y(n) 
and 


6-6-6. 
The final result has the form 


A(n) + >> B(n)C(N) + D(N), 
where the functions of N come from 
—3 ave {u} ave {u’} + 2{ave {u}}’, 
which we may suppose already known. 


We can again proceed by finding these two terms first, and then using special 
populations to determine the coefficients in A(n). 


5. The variance-second method. We have 


; 
ave {ke} = ke, 
. * 
ave {ke} = (ave {ke})” + var {ke} 
) 


- a ’ ( ee 
So Rp PQs 
at 1+ — ati tok, 
\ n— | n 
then 2 B(n)C(N) + D(N) must come from 
6 ” 
n— 1 


er. \ ‘\3 ant in? 
kooke one . ks ke + 2(ke) = Skee ke ’ 


whence 


oe aneiieaed ) - 4 ‘ 9 ot 
pe B(n)C(N) —_. N i a | koe = NIN ~ kia) 


= Be + yey 
N=—2 | 
v= 1 
os Micon 
N(N — 1) 


\ 


4 ’ ’ 
me Ko20 a kis) ’ 





VARIANCE COMPONENTS 381 


thus, the part of the third cumulant of the usual estimate of k, which depends 
on the population size, N, is 


») 


2 3 / 
\wi ~ aw} ® 


, 


9 
14 18 8 12 } ks 


TiNW-1) aN +1) NW-—1!* @-DNW— 1) 
24 12 ) 
NIN— 1) N@—1l) aW 
24 
r \w-—1 y m—DW— DJ 


When n \ this reduces to the negative of 


, 
ko . 


: » kis + ——— 


< ki 
{n(n — 1) n(n — 1)? n(n ae 


4 4 12 ' 8 
a “eee aes 
= “4 + in — 1)3 
and since the k, estimate is constant when n = N the additional terms not 
involving N must be those just given in (*). 


6. The between component in a balanced single classification. We now come 
back to the balanced single classification and drop the primes. Hence the column 
contribution is drawn from n k,, ky, --- , and the error contribution is drawn 
from N, Ki, Kn, --- . We are going to find the third cumulant of the between 
component in sampling from an arbitrary finite population. 

When we express this third cumulant multilinearly in the k’s and K’s, there 
may be terms involving 

(1) ke , Kos, Kes, Kee , 

(2) kK ’ kK ’ kK; , KoKy, eK , 

(3) Ks . Ku 9 Ks : Kon ‘ 
and no others (because of homogeneity and invariance under translation). The 
discussion just given for the variance of a sample applies with minor changes 
to the case where the errors are constant. This determines the coefficient of the 
terms involving the monomials in (1) to be the same as for the third cumulant 
of the variance, with ¢ replacing n. 

We go next to the terms involving the monomials in (3). Suppose first that 
the column contributions are constant. If we take a minimal unit population 
for the individual contributions, the between component is constant. Thus, 
the desired third cumulant, and everything it could involve except Kes , vanishes. 
Thus the coefficient of Kg is itself zero. 

In order to deal with the next coefficients, we can use a population with two 
non-zero values for the individual contributions. But if both values are alike, 
we cannot distinguish between the coefficients of Ks, and Ky. So we use a 
population of re values 1, ¢, 0, --- , 0, for which 

a 2 4 
Ku 2t ' t+t 


~ #elre = 1)’ * relre — 1)° 





382 JOHN W. TUKEY 
There are two cases to consider 

Case 1. (Probability (r — 1)/(re — 1).) One « = 1 and another zx 
the same column; others, zero. 

Cask 2. (Probability r(e — 1)/(re — 1).) One x = 1 and another x 
different columns; others, zero. 

The corresponding analyses of variance are 


CASE | 


Between 


Within 


dba 1)(@ + 1) 2t 


Between ; _— 


Within 


Thus the mean of the between component vanishes, as it should, and the 
third cumulant reduces to the third moment about zero, which is 


f 


— | 9 3 — F 9 \3 
7 1 ;2 t \ 4 re ; 1) an t 
ro— ii\rer — IJ re — | 





rer(c — 1){ 


aa fi Bek i als 
rPe&(rc— 1) \(r— 1)? re — 1)*) ° 


rm™.: 3 . . 2 4 ° 
This has ¢ as a factor, and does not involve t + f, hence the coefficient of 
Kz is zero, while that of K3; is 


ee ee 1 
Pe \(r —1)? r(e- De 
which, as we might have suspected, is small when the two sets of degrees of 
freedom are of nearly the same size. 
We can deal with the coefficient of Ko», and certain of the other terms, by 


resorting to normal theory. In this case, the two mean squares become inde- 
pendent and their third cumulants are 


8(K2 + rkz)’/(e — 1)*,  8K3/e(r — 1), 


so that the third cumulant of the between component is 





VARIANCE COMPONENTS 


5 
rn 


{(Ka+ rks)’ Kr \ _ *(z. Bt ei iach ) K: 
lt—1? e@r—DY A\E—-1? er —D*/ * 
= as Kt ke + 


(c — 1)? 


1 24 
—_—— } Ke, + ————.. Kuk. 
24 J 8 
TS K 2 kee a keoe ° 
si | i x (ec—1) ~~ f 
Thus, we have determined the coefficients of Ke» , Keke, and Keke, and have 
confirmed the part of the coefficient of kez. which is independent of N. 
There remain the coefficients of kyK, , ksKz, and k.K,. These we shall seek 
by taking a minimal unit population for the column contributions and a mini- 
mal population with one nonzero value equal to ¢ for the error. We have 


ky | Cc. ks = 1/c, ke = 1/e, 
K; = t/re, K; = t/re, 


al 3 
> 


3r° r 72 
7 K, ke ~ i ak 1)? v 


(— 1 


We have the usual two cases to deal with. 


Cask 3. (Probability 1/c). One x = t + 1, others in column = 1, others 
zero. 


Case 4. (Probability (c 1)/e). One column of z’s = 1, one other x 
all others zero. 


Corresponding analyses of variance are 


CasE 3 


Bet weer 
Within 


CasE 4 


Between 


Within 





384 JOHN W. TUKEY 


The mean between component is 1/c, as it should be, and its third moment 
about the mean is 


1 ya > ) - tao} 
ec \re c re(c — 1) =a\ (c — 1)7/ ° 
The terms in ¢’ and ¢ are conspicuous in their absence, and hence the coefficients 
of k,K2 and kK, vanish. The coefficient of k;K; is 
ae ) 
Se 
r ¢? | (c — 1)? 
We can now reassemble al! our coefficients and write down the third cumulant 


of the between component in a balanced single classification with ¢ categories 
and r units per ew The result is 


oe cane 3 4 18 14 | 
\a-ata bt {a —1) (c—i1n' nn— pt * 
i ee ale al nk 
\ete —1)? (e—1)nrn—-1)) nin — 1)? 

12 12 24 24 «(| 
as Ae la ca : a 
+ lee — 1) (e—1)n clin —1) + n(n — 1){ 


‘ 


kas 








cm 24 16) 


24 f l 


r(c — 1)? z r 


4 1 1 l \ 
ate —1)? re — 1) Kua + eon — 1)? ~ &r — 1) 


for populations of m column contributions and N error contributions. 
For infinite populations this reduces slightly, and becomes 


4(c — 2) ete 8 24 


+ ——, ken Ke + m4 i- ee ks Ks + a 
cS "¢ (c — 1) ' fe— 





ke + koo0 + —_—_—_—— ko 


2K 
~ 2 Wem IP , 


ae aa Ge " 
1 


~ (e— 1) 


24 


aust r(e — 1)3 — 13 


ke Kx» 


] - 8 l | : 
(r—1)? r(e— > Ke + A\C —-1)? e(r- ip Ke 
REFERENCES 

[1] Jonn W. Tuxey, ‘Some sampling simplified,” J. Amer. Statist. Assn., Vol. 45 (1950), 
501-519. 

[2] Joan W. Tuxey, ‘Variances of variance components: I. Balanced designs,’ Ann. 
Math. Stat., Vol. 27 (1956), 722-736. 

[3] Joun W. Tuxey, “Variances of variance components: II. Unbalanced single classifi- 
cations.” Ann. Math. Stat, Vol. 28 (1957), 43-56. 

[4[ Jonn WisHart, ‘Moment coefficients of the k-statistics in samples from a finite popu- 
lation,” Biometrika, Vol. 39, pp. 1-13. 





SOME REMARKS ON SYSTEMATIC SAMPLING! 


By Werner Gavutscur 
University of California, Berkeley 


1. Introduction and summary. Consider a finite population consisting of N 
elements %: , ye, *** , yw . Throughout the paper we will assume that N = nk. 
A systematic sample of n elements is drawn by choosing one element at random 
from the first k elements y:,--- , yx, and then selecting every kth element 
thereafter. Let yi; = Yisu-ne(t = 1,---, k3 7 = 1,-°+, n); obviously sys- 
tematic sampling is equivalent to selecting one of the k “‘clusters”’ 

Cy 7 {yas J = 1, 7 eee n} 
at random. From this it follows that the sample mean g; = 1/n >>?) yi; is an 
unbiased estimate for the population mean g = 1/N )-i-. >-}e1 ysis and that 
Var 9; = 1/k>-i-1 (9; — 9°. We will denote this variance by V%)) indicating 
by the superscript that only one cluster is selected at random. V$)? can be 
written as 


k k n 
] 


gs 7 z S. where S° = ¥ : ~ (ys — 9)’, 


i=! tel jal 


> (yi; —_ 9;)°. 


l 
1 jak 


It is natural to compare systematic sampling with stratified random sampling, 
where one element is chosen independently in each of the n strata {y:, --- , ye}, 
{Yesr,°°* 5 Yo}, ***, and with simple random sampling using sample size n. 
The corresponding variances of the sample mean will be denoted by VS? V2 
respectively. 

We consider now the following generalization of systematic sampling which 
appears to have been suggested by J. Tukey (see [3], p. 96, [4], [5]). Instead of 
choosing at first only one element at random we select a simple random sample 
of size s (without replacement) from the first k elements and then every kth 
element following those selected. In this way we obtain a sample of ns ele- 
ments and, if 7 , 7, --- , 7, are the serial numbers of the elements first chosen, 
the sample mean 1/s(;, + --- + 9:,) can be used as an estimate for the pop- 
ulation mean. This sampling procedure is clearly equivalent to drawing a simple 
random sample of size s from the k clusters C;(i = 1, --- , k). It therefore fol- 
lows (see, for example, [2], Chapter 2.3 to 2.4) that the sample mean is an un- 
biased estimate for the population mean and that its variance, which we denote 
by V5, is given by* 

Received June 12, 1956. 

1 This investigation was supported (in part) by a research grant (RG-3666) from the 
Institutes of Health, Publie Health Service. 

2 Now at Ohio State University 

* This formula is not new, but appeared already in [6] and, more recently, in [5]. 


385 





WERNER GAUTSCHI 


tt)  &=-8 1 - ' : — 
(2) Vv, =—— _ (ji - = Ves 
te emi - 
Again, it is natural to compare this sampling procedure with stratified random 
sampling, where a simple random sample of size s is drawn independently in 
each of the n strata {y:,--- , ye}, {Yerr, °** » Ye}, --* Or with simple random 
sampling employing sample size ns. We denote the corresponding variances 
of the sample mean (which in both cases is an unbiased estimate for the popu- 
lation mean) by VS?, VS)? respectively. From well-known variance formulae 
(see, for example, [2], Chapters 2.4 and 5.3) it follows that 
oe lk—s,, 
Vie = -—— Ve, 
: sk—1 
(3) - 
r(nea) N — ns (n) lk—s y(n) 
J m= = ot = 2 } ran ~— a : —— J ran - 


— n) l 


Thus the relative magnitudes of the three variances V};’, V'i’, Via.’ are the same 
as for VS), VS?, Vi22, of which comparisons were made for several types of 
populations by W. G. Madow and L. H. Madow [6] and W. G. Cochran [1]. 
Some of the results will be reviewed in Section 3. 

The object of this note is to compare systematic sampling with s random 
starts, as described above, with systematic sampling employing only one ran- 
dom start but using a sample of the same size ns. To make this comparison we 
obviously have to assume that k is an integral multiple of s, say k = ls. The 
latter procedure then consists in choosing one element at random from the first 
l elements {y,,---, yz} and selecting every [th consecutive element. We de- 
note the variances of the sample mean of the two procedures by V¢”, V?” re- 
spectively, indicating by the subscript the size of the initial ‘‘counting interval.” 


r(s) 


(In our notation VS; = V{”.) We shall show in Section 4 that V}” = V;” in 
the case of a population “in random order,” but V;” < V;” for a population 
with a linear trend or with a positive correlation between the elements which 
is a decreasing convex function of their distance apart. Some numerical results 
on the relative precision of the two procedures will be given in Section 5 for the 
case of a large population with an exponential correlogram. 


2. Acknowledgment. I wish to express my debt to Professor W. Kruskal for 
having brought the question treated in this note to my attention. 


3. Cochran’s approach. Extension of Cochran’s results to systematic sampling 
with multiple random starts. Instead of considering a particular single popula- 
tion {y1, yz, °°: , yw} we assume, following Cochran [1], [2], Chapter 8, that 
the y,’s are drawn from an infinite population having some specified properties. 
We are then interested in comparing the expected variance E(V | y,,--- , yw) 
rather than (V | y,---, yw) for the sampling procedures under consideration. 
More specifically, we consider the following three types of populations. 

(i) Population in random order. The variates y; are assumed to be uncor- 





SYSTEMATIC SAMPLING 387 


related and to have the same expectations. The variances may change with i 
Ey: = n, Ey; — wu)’ = (@@=1,---,N); 

E(y; — u)(yj — ») = 0 (t ¥ j). 
It is not difficult to show ([2], Chapter 8.5) that in this case 


(4) 


» 7¥7 yyr (1) x(n 4 wT. k = : 
(5) EV? = EV? = EV? = me E re, 
where o = in oi/N. 
(ii) Population with a linear trend. We assume that the y,’s are uncorrelated 
variates whose expectations change linearly in 7, more precisely 


(6) Ey; = @ + Bi, Var _ = o (i = 1, 2, 7 ae N), 


Cov (yi, yi) = 0 (t ¥ 9). 
Applying standard linear regression theory (see, for instance, [7], Chapter 
14.2) to the sum of squares in (1), it is easily found that 


(8) EV? _N-* 34 ¢ < ae e+ pt; =e 
Nn 2 ~ nk 


In a similar way we obtain 


Ev? =* 
py (k — 1)(nk + 1) 

ae 12 : 

Thus 

(10) a 4 SVS se. 

with equality only ifn = 1. 

(iii) Population with serial correlation. It is assumed that two elements y; , 
yj are positively correlated with a correlation which depends only on the “‘dis- 
tance” z = |j — i| and which decreases as z increases. The mean and variance 
of all the 7; are supposed to be constant 

Ey;=», Ey-—u =o 
(11) - 

Ety; = 1) (Yi+e "> u) = 9,0, 
where p,, = p:, = 0 for a < x. For this type of population Cochran [1] ob- 
tained the following results relevant to our purpose: 
N—1 


sett) = 5 Jha. 2 iNT 
cian EY 8 W o \1 Nik—D a (N 2)p. 


ae 
+ = TF > (n — 2)prsp 
























388 WERNER GAUTSCHI 


(13) EV? < EV, 
(14) EVS s EVs), 


(14) applying if, in addition, p, is convex downwards. 

In virtue of (2) and (3) all the results (5), (10), (13) and (14) carry over im- 
mediately to the more general sampling procedure discussed in Section 1 and, 
moreover, the relative sizes of the variances VS"), VS?, VS remain the same 


Py@ yo ’ . . - 
as those of VS), VS?, V2? . Numerical results of the relative precision 
r(l 
EVS?/EVS. 
were given by Cochran [1] for populations with a linear and exponential cor- 
relogram. 


4. Comparison of systematic sampling and systematic sampling with multiple 
random starts. 
) Population in random order. From (5), replacing k by | and n by ns, we 


obtain 
bs bg Red 
EV;” = —o — oe, 
; Ins ° _ ioe 
On the other hand, by (2) and (5), remembering that sl, 
» lk-—sk—lo I-11, 
EV; = nates em OF 
' sk—1 k mn N r 
Thus 


(15) EV;” EV;’. 
(ii) Population with linear trend. By (2) and (8) 


-. 





a l » 
> o 


py) _ Lk — si k—1 k—1 te 2 (1 — 1)(ls + 1) 
EV; ian | E o + ps 12 - NN e+ 8 — To's ff 


Hence 
(16) EV,” s EV; 


with equality only if s = 1. 

Both these results are, of course, to be expected intuitively. The comparison 
of V;” and V;” is, perhaps, mostly relevant for a population with a convex 
decreasing correlogram, since in this case EV;” turns out to be the smallest 
among all the variances EV;”, EV{”, EV$?, EV. 

(iii) Population with serial corvelitiies, From (12) and {2), 


yy 


L—1 of 2 
wr 2°: 41 ba = N . 
Nn ° Nd-1 24 (N — ”" 











SYSTEMATIC SAMPLING 389 


21 a sud 
- NS — Z)piz |? 
ns(l — 1) 2 ( ” 


os ia aes. 


N 


—1 2f 2 ae ; 


ys n—l ’ ‘T) 
=- (n — 2) pi: 
nk — 1): Pes | 


b=1 » 
-_ 7 i1 = L}. 
It is easy to check that both L, and L, are linear forms in the p,’s in each of 
which the sum of coefficients is equal to 1. Hence, in order to show that EV}” < 
EV”, it is enough to prove that 


(19) L= Ly - In = 0, 


L being a linear form of the p,’s whose sum of coefficients is zero. If in addition 
to the monotonicity the p, are assumed to be convex, the following lemma, 
which is analogous to the lemma proved in [1], is applicable to forms of this 
type. 

Lemma. Let S be the set of p = {pr, po, +-* , pm} for which 
(20) aA=mae*:: = pr =O 
and 
(21) Appt = Pati — 2p5 + Ppa = O (gp = 2,3, ---,m — 1). 


Let a, +--+ , &m be constants such that >-™, a, = Oand put A; = a a, . Then 


for all peS 
if and only if 


(22) => A,;20 for j = 1,2,--- 


Moreover, if in addition to (20) and (21) strict inequality holds in (22), then L > 0 
unless py) = -** = Pm. 

Proor. Writing a, = Ay — Aya(u = 1, --- ,m; Ao = 0) and using the fact 
that A,, = 0, we find 


m m—1 


L = Z Ag Pp oP > A, ae. Du A, Ap, . 


p= l p=l 
Similarly, 
m—1 


>, A, Ap, - hi B.A’ pp + Bul pm _ Pm—1)- 


pm] p=l 





390 WERNER GAUTSCHI 


Thus 
m—2 


(23) L = 2 B,A’py + Bus(Pm1 — Pm): 
p=l 

Since, by hypothesis, the coefficients of all the B, are nonnegative, the suffi- 
ciency of (22) is clear. On the other hand, if B,_,; < 0, we could choose the 
py linearly decreasing and obtain L < 0. If B; < 0,1 S 7 S m — 2, L could 
be made negative by taking, for example, 
(j+2—un, 
Pp = * 

(1, jt+1lsusm. 


Thus (22) is also a necessary condition. If all the B; are positive, then L = 0 
implies A’p, = O(u = 1,---, m — 2), pm: = pm. This in turn implies that 
Pm—2 = Pm—1, Pm—3 = Pm-2,°** » Pl = Pr. 

THEOREM. For any population in which 


Aat=zapea:::2)] pr) = 0, 


A’p.1 = pei — 2p. + i = 
we have 
(24) EV; s EV?’ 
with equality only if s = 1 or pp = «++ = pw. 


Proor. There is nothing to prove if s = 1. If s > 1 we apply the above 
lemma (with m = N — 1 and L given by (17), (18) and (19)) and show that 
(25) B,>0 j ,2,---,N—2 


We notice that 


N 1 [Sy 2 
> L; = = P (N — 2)pz — I > (ns — Son | 


z=) z=] 


z=] z==ul 


1 , | ' ; . | 
pins P (N — z)p, — (Is)? Dy (n — 2)p we. 
So 


To prove (25) it is enough to show that the sums B; are positive for the form 
NL/2 = NL,/2 — NL./2. We compute these sums separately for NL,/2, NL2/2 
and then take their differences. Put‘ 


j = vk + ol+dr = (vs + ol +A, where vy = 0,1,---,n —1; 
¢ =0,1,---,s-—1; 1=0,1,---,l-—1. 


(26) 


‘We use the Greek letters v, 7, \ to indicate their range n — 1, s — 1, / — 1, respec- 
tively; ¢, \ should not be confused with the variance symbol and the parameter to be in- 
troduced in Section 5 





SYSTEMATIC SAMPLING 


By elementary computations the sums BS” for NL,/2 are found to be 


where 


= t{(vs + o)l + Al[(vs + ol +r + 1)[3N — (vs + o)l — A — 2] 
1-2 [ = i(2ns — t — 1) (vs + o)(2ns — vs — o — »] 
C heed —— 5 - —_ — 


1 
+ (0 + 1) ——_——s— 
=1 ~ ~ 
Pi 1S ) 
= SS" [vs + o — 1)(8ns — vs — o — 1) 
+ 3(A+ 1)(2ns — vs — o — 1). 
Similarly the sums BS” for NL./2 are obtained as 


BY = 6a tie IIT}, 
ls — 1 


2 


t=] - 


; at ve, gs vena 3 
(Is)? ts > u(2n Hd sitet 1) en =v = »| 


(1s)° . 
; [ls(y — 1)(38n — vy — 1) + 3(ol + A + 1)(2n — v — I1)I. 
) 


We have to show that 


a 8 
6 (1 — 1)(ls — 1) 


. | (s — 1)61 — (ls — 1) a + (l — 1) oo] > 0. 


B; = BS’ — BY = 


After some elementary algebra the expression in brackets is found to be a poly- 
nomial f(c) in o of third degree with the following coefficients 


o:E(l — 1) 
: —3U(l — 1)[(n — v)ls — (A + 1)] 
:Uf{(l — 1)[3s(n — v)(sl — 2(A + 1)) — sl + 38rA + 28 + 1) 
— 3K(A + 1)(s — 1} 
°. (gs — 1){vls(l — 1)(ls — 1) + ACA + I)[Bla(n — v) — (A + 2))}. 





392 WERNER GAUTSCHI 


We notice that the second derivative f’(c) vanishes at 


A+ 
o* = (n — v)s — a7 
l 
which is 2 s — 1 whatever be the values of v, \ specified by (26). For any of 
those values f(c) is therefore concave between ¢ = 0 and o = s — 1 so that 
it is enough to show f(0) > 0, f(s — 1) > 0. Now, if ¢ = 0 then not both », 
can vanish. Hence, f(0) > 0 follows immediately from (27). On the other hand, 
f(s — 1)/(s — 1), after some slight rearranging, can be written as 
f(s — 1) 4 


on r* 3(n — yv)si[(l — 1)(l — 2(4 + 1)) + AA YF 1) 
s=— 


(28) +1{Ul — 1)((s — 1) — 8) + 3(s — IA + 11 — 1 — A)} 

+ A{3sl(l — 1) — (A+ 1) + 2)} + Ul — 1){28 + vs(ls — 1) + 1}. 
The expression in brackets is a polynomial of second degree in \ with a positive 
leading coefficient and with roots \ = 1 — 2, \ = 1 — 1. It is therefore non- 
negative for \ = 0,1, ---,l — 1. It is easily verified that the quantities in the 
three braces are nonnegative for / > 1, s > 2 and \, » satisfying (26). Further- 
more, the last term is positive. It remains to consider (28) for the particular 
case s = 2. We have 
fd) = 6 — 1) — 24+ 1)) + AAF+ DI 

+ {3 + 1) —1— aA) — Ul — 1)} 
+ rA{6ll — 1) — (A + 1)(A + 2)} + 51 — 1). 


The right-hand side is a polynomial ¢(X) of third degree, 


g(d) = —r + 3(1 — 1)’ — (8P — 6 + 2) + Ul — 1)(51 — 4), 


whose second derivative ¢’(X) vanishes at \ = 1 — 1. It is easy to verify that 
¢(A) has its relative minimum at \ = 1 — 1 — +/3/3. Hence ¢(A) > OforA = 
0,1, --- ,2 — 1 follows from 


g(t — 2) = g(l — 1) = 2021 — 1)(1 — 1) > 0. 


This completes the proof of our theorem. 

For populations with serial correlation the result (24) is to be expected also 
on intuitive grounds; in fact, the systematic sample is spread more evenly 
through the population than the sample with multiple random starts which 
may contain elements very close together, giving about the same information. 
Our proof, however, does not make clear why (24) only holds for populations 
with a convex correlogram. That (24) does not generally hold for any monotone 
decreasing correlogram can readily be seen by trying to apply Cochran’s lemma 
[1] to the linear form (19). It turns out that, for example, the sum of the first 
l coefficients of NL/2 is equal to 





SYSTEMATIC SAMPLING 


= [(2n — 1)s — 1] < O. 


2(ls — 1) 


. r(1 
One might suspect that EV;” = 
decreasing correlogram. However, according to our theorem EY} 


the example of a linear correlogram, so that the conjecture is not generally 


yt1) sxr(s) ¢ 
1 7 EV ke for 


very (6) ‘ ‘ 
EV,” for all populations with a concave 


true 
5. Asymptotic results in the case of an exponential correlogram. We assume 
(2 = 1,---,N — 1) and that both / and n are large. For n, k 


that p, = e 

large Cochran [1] showed that the expression in braces of (12) is approximately 
equal to 1 — 2/Ak + 2/(e“ — 1). Since the corresponding expression 1 — L, 
in (17) is obtained by replacing k by | and n by ns, we find 


9 
L = dam dem gt gree 


(29 


On the other hand, replacing 1 by k = Is, s by 1 in the brace of (17), we obtain 


1 — Le of (18). Thus 
2 


L-h~wl— st era 


, we see that the relative precision of systematic sampling 


. —rl 
Introducing p = e 
over systematic sampling with multiple random starts 


y r(s) 
es ee 
EV; 


depends, apart from s, only on the correlation p of elements of a distance / 
apart. Clearly lim,, RP = 1; also, expanding numerator and denominator in 
power series, it is readily seen that lim,,, RP = s. The numerical values in 
Table 1 show that the limit as p | 0 is approached rather slowly. 


6. Concluding remark. When the statistician has a choice between systematic 
sampling and systematic sampling with multiple random starts, he is more 


TABLE 1 


Relative precision RP of systematic sampling over systematic sampling with 


multiple random starts for an exponential correlogram 








394 WERNER GAUTSCHI 


likely to use the latter procedure because its variance can be estimated from 
the sample and the estimate is unbiased whatever be the form of the popula- 
tion. On the other hand, as we have seen in Section 5, systematic sampling is 
considerably more precise in the case of a population with an exponential cor- 
relogram. Thus, it may be worth while to try to find an estimate for the vari- 
ance of systematic sampling which is at least consistent in some sense if the 
underlying assumption of an exponential correlogram is realized. In view of 
(17) or (29) this would involve estimating the correlation between the elements 
as well as o?. 


REFERENCES 


". G. Cocuran, ‘Relative accuracy of systematic and stratified random samples for a 
certain class of populations,’’ Ann. Math. Stat., Vol. 17 (1946), pp. 164-177 
. G. Cocuran, Sampling Techniques, John Wiley and Sons, 1953 


’. E. Demine, Some Theory of Sampling, John Wiley and Sons, 1950. 


L. Jones, ‘‘The application of sampling procedures to business operations,”’ 
Amer. Stat. Assoc., Vol. 50 (1955), pp. 763-774 
L. Jones, “‘Investigating the properties of a sample mean by employing random 
subsample means,’’ J. Amer. Stat. Assoc., Vol. 51 (1956), pp. 54-83. 
.G. Mapow anv L. H. Mapow, ‘‘On the theory of systematic sampling,’’ Ann. Math 
Stat., Vol. 15 (1944), pp. 1-24 
(7] A. Moon, Introduction to the Theory of Statistics, McGraw-Hill Co., 1950 





TIGHTENED MULTI-LEVEL CONTINUOUS SAMPLING PLANS 


By C. Derman,' 8. Lirraver*®:* anp H. Sotomon?: * 
Columbia University 


1. Introduction. Industrial needs have provoked some recent studies on con- 
tinuous sampling. This procedure is especially of interest when the formation 
of inspection lots for lot-by-lot acceptance may be impractical or artificial as 
in conveyor-line production, or when there is an important need for rectifying 
quality of product as it is manufactured. 

These newer papers are best considered in the light of the earlier papers of 
Dodge [3] and Wald and Wolfowitz [11]. One point of departure from the Dodge 
type of plan has been the introduction of several levels of partial inspection 
with different rates of sampling in each level. Multi-level continuous sampling 
plans (which reduce to the Dodge plan when only one sampling level is tolerated) 
have been considered by Greenwood [8], Lieberman and Solomon [9], and Resni- 
koff [10]. A plan based on the Wald-Wolfowitz approach, a scheme essentially 
handled by the methodology of sequential analysis, was created and developed 
by Girshick about 1948 in connection with a Census Bureau problem and has 
only recently been reported [7]. The reader is referred to Bowker [1] for a more 
thorough account of continuous sampling plans. 

The multi-level plan given in [9], namely MLP, allows for any number of 
sampling levels, subject to the provision that transitions can only occur between 
adjacent levels. Three generalizations of MLP, accomplished by altering the 
manner in which transition can occur, are analyzed in this paper. In each situa- 
tion, we will make it more difficult to get to infrequent inspection than in MLP, 
and thus we can label these three plans as tightened plans. These three plans 
which will now be specifically defined obviously relate to more realistic situa- 
tions for control of industrial processes. The three plans are given in language 
which assumes some familiarity with MLP, which is given in detail in [9]. 

(a) The MLP-r X 1 Plan. We say we are in the jth sampling level if every 
(1/f)’-th item produced is systematically sampled. If i consecutively inspected 
items are found clear of defects when sampling at the jth level, begin sampling 
at the (j + 1)-th level. On the other hand, if a defective item is found before 
this is accomplished, revert immediately to the (7 — r)-th level, if 7 > r, or to 
the zero level, that is, one hundred per cent inspection if 7 S r. Let inspection 
begin at the zero level. When r = 1, we have the MLP plan described in [9]. 

(b) The MLP-T Plan. This is exactly the same as the MLP-r X 1 Plan, 
except that when a defective is encountered, we immediately revert to one hun- 


Received May 24, 1956; revised September 14, 1956. 

1 Work sponsored in part by the Office of Scientific Research, U. 8. Air Force. 
? Work sponsored in part by the Office of Naval Research. 

* Work spensored in part by the Higgins Fund, Columbia University. 


395 





396 C. DERMAN, 8. LITTAUER AND H. SOLOMON 


dred per cent inspection. This is obviously the tightest of the three multi-level 
plans considered in this paper and thus bears the label MLP-T. 

(ec) The MLP-r X s Plan. This plan follows exactly the same pattern 
as the MLP-r X 1, except that when 7 consecutively inspected items are found 
nondefective while on the jth sampling level, systematic sampling begins at 
level (j + s). We shall consider the case r > s, since we are concerned only with 
tightened multi-level plans. If r = s, we are effectively using the MLP Plan. 


2. Summary. Each of these generalizations can be appraised under the as- 
sumption of an infinite number of sampling levels or a finite number, /, of sam- 
pling levels. Under the assumption of an infinite number of allowable sampling 
levels, it is possible to obtain explicit relationships between the AOQL and the 
parameters of the plan for MLP-r X 1 and MLP-T. Thus it is possible to 
graph contours of equal AOQL for each of these plans under these conditions. 
Approximations for contours of equal AOQL for the MLP-r X s Plan are then 
sasily obtained. This makes feasible the possibility of a catalogue of continuous 
sampling plans which contains plans having a prescribed AOQL and thus aids 
immeasurably in the choice of an appropriate plan. As is demonstrated in the 
next sections, the following results are obtained, assuming that the production 
process is in statistical control and items found defective on inspection are re 
placed with good items. For the MLP-r X 1 Plan: 


(ff er+l\ l/s 
(2.1) AOQL =1- (7— ) 


=F, 


When r 1, this reduces to the result previously obtained in [9]. For 
the MLP-T Plan: 


(2.2) AOQL = 1 — f’". 


This result can also be obtained heuristically by letting r approach infinity in 
MLP-* X 1. For the MLP-r X s Plan (r > s) bounds and sometimes exact 
AOQL’s can be obtained using the previous two results. For example, if r | 
and s = 2 and f is given, the MLP-2 X 1 Plan for f’ = f° will be the same plan 
and hence have the same AOQL. More generally for a given f we can write 


(2.3) AOQL 1 25 < AOQL,., < AOQL,::, 


where r’ = greatest number less than r that is a multiple of s, and r” is the 
smallest number greater than r that is a multiple of s. For, if r’ < r”, the plan 
associated with r” is tighter and the added protection thus insures a better 
outgoing quality, i.e., a smaller AOQL. Under the assumption of a finite number, 
k, of allowable sampling levels, the AOQ function for MLP-T is obtained, and 
it is seen that the use of digital computers may be expedient for the computation 
of AOQL contours. This was exactly the situation, for finite levels, in [9]. The 
main results of the paper are obtained through the use of Markov chain tech- 
niques which are developed in Section 3. In these plans inspection, as described, 





SAMPLING PLANS 397 


is by systematic sampling. However, the AOQ and AOQL results also hold when 
inspection in each level is accomplished by random sampling—i.e., in the kth 
level, each item in the block of f“ items has probability f* of being chosen for 
inspection. 


3. Markov Chain Result. Let {X,}(n = 0, 1,--- ) denote an irreducible 
recurrent positive Markov chain with states {Z;} (j = 0, 1,--- ). Let {pis} 
(i, j 0, 1, --- ) denote the probability of transition from state E; to Z;. 
It is known (see [5]), that a unique sequence {v,} exists such that 


~ ViPi = %j, (j 


i=0 
(i 
=1l. 


The v,’s are sometimes referred to as “‘steady state’”’ probabilities. 

Now let A = {E;,} be a subset of the states. Let Yo, Yi, --- be successive 
members of {X,} which take on values in A. Since the chain is recurrent, infi- 
nitely many such Y’s will exist with probability one. It was shown by Derman 
[2] that {¥,} (& = 0, 1, --- ) isalsoa Markov chain; and if {p;;} (EZ; , B;e¢ A) 
are its transition probabilities, then the solutions v; of 

DU Pis = 2 (E;e A), 


ByeAa 


v,>0 (E;e A), 


(E;e@ A). 


de %5 


Bieta 


Suppose A, = {E;} (j = 1, 2,---); As = {Bj} G = 2, 3,---)3--- Ay = 
{E;} (§ = 9,9 + 1,-+-) +++ are subsets to be considered. Let { Yi(g)} denote 
the Markov chain defined over A, . Also let E;(g) (j = 0, 1, --- ), the states 
for the chain { Y;(g)}, be a relabeling of the states E,(k = g,--- ) by letting 
j = k — g. Finally let p;;(g) denote the probability of transition from state 
E(g) to state E,(g) in the chain { Y;,(g)}. Our main tool is the following theorem 

TueoreM. If p;; = pis(g) (4,7 = 0,--- 3g = 1,--- ), then 


(3.4) vs = vo(l — v9)’ Gj = 1,---). 


Proor. Let {v,;(g)} denote the solution of (3.1) for the chain { ¥;(g)}. Since 
the transition probabilities, by hypothesis, are the same regardless of which 





398 C. DERMAN, 8S. LITTAUER AND H. SOLOMON 


chain is under consideration, v,(g) = v; (¢ = 0, 1, --- ). However, from (3.3) 
we have 


> “9 


Thus by induction, 


Vo(1 a 6s 4 ae v;-1) 


§—i 


(3.6) Vo {1 a >» (1 — Uo)'| 


i=1 
= vo (1 — Vo)’ 


and the theorem is proved. 
We shall apply the theorem in the following case. Suppose 


Pii+t a > 0 
Pio=l—a 
Diew=l—a (t>r). 


It is clear that the chain is irreducible. It also follows from a slightly modified 
theorem of Foster ({6], Theorem 5, p. 81) that the chain is recurrent positive if 
a < r/(r + 1). Intuitively this condition guarantees a sufficient pull to the 
left, thereby insuring the existence of the steady-state probabilities inherent 
in a recurrent positive chain. Furthermore, it is easily seen that the conditions 
of the theorem are satisfied so that the v; have the form (3.4). From 
(3.1), 7 = 0, vo is determined by the following equation 

, 


f \r+l) 
,fl-(Q-ov 
(- a) — Rant = 1, 


7 
‘ 


> 
\o. 
and thus any v; can be obtained. 


4. Application to MLP-r X 1 infinite-level plan. The multilevel plans can 
now be studied from the point of view of a Markov chain {X,,} and the results 
in Section 8 employed. We let Ejmn(j = 0,1, --- ;m = 0,---,7%— 1) denote 
the state of such a chain where we say that X, is in state Z;,, if just after the 
nth item has been inspected, the process is in the jth sampling level (i.e., every 
(f ’)th item inspected) and m nondefectives have been observed successively 
while in the jth level. Suppose the process is in a state of control such that p 
is the probability of a defective being produced. The transition probabilities 
are then given by 


P(E jm ——- E 5,m+1) -i- y= 8s 
(7 = 0,1, --- ;m=0,1,---,#— 2) 





SAMPLING PLANS 


P(E; «1 > Ej41,0) a | 
P(E jm — Ej-+0) = P 
P(E jm — Eoo) = p i= +++ r— 1). 


The chain is easily seen to be irreducible. From Foster’s theorem it is seen to 
be recurrent positive if g' < r/(r + 1). We shall assume g < r/(r + 1) for the 
present. Now let A = {Ej} be a subset of the states and let {Y,} denote the 
chain defined over it. The chain is of the form of the special case considered in 
section 3 with a = gq’. Let {v;} and {v;,} denote the steady-state probabilities 
of the chains { Y;} and {X,}, respectively. Using (3.1), (3.5) and (4.1) it follows 
that 


1 — yim ’ 
(4.2) Vin = T : v;q (m = 0,1, --- = 0,1,--- 
— q* 


For from (3.1) 
i m 
Vim = Vj0q 
(m = 0,--- 


and from (3.5) 


’ V0 


ae 
, Ux0 
k=0 


v; 


Hence, 
Vin = - VEO v; q” 
k=0 
but summing over j and m we get, since )>;.mUjm = 1, 
ye = = Se 
> k0 i-¢ 
From (4.2) it is clear that v; is the sum of the steady-state probabilities of being 
in the jth level of sampling. Also from (3.4) 
(4.3) vj = vo (1 — 0)’ 
where vo is given by (3.7) with a = q'; namely, 
. a 1 _ ares 
a —¢) es “a 
Vo 


where as previously remarked, vp is the probability of being in one hundred 
per cent inspection. 

Now that we have expressions for the steady-state probabilities, we proceed 
with the derivation of the AOQ functions and the AOQL. Let h(X,) = f’ for 





400 C. DERMAN, 8. LITTAUER AND H. SOLOMON 


Xn = Ejm. It is easily verified that the reciprocal of the average fraction in- 
spected after n inspections is 


(4.4) Fz! = +> a(x). 

T v1 
It follows from the Birkhoff ergodic theorem, applicable for stationary Markov 
chains of the type considered here (see Doob [1], p. 460), that 


oo s—1 

(4.5) F* = lim F;’ = > f7? Dd vm. 
no 7=0 m=0 

Now F™ denotes the reciprocal of the average fraction inspected for all sequences 
(except for a set having probability 0); for let % = S~h-1h(Xm) = number of 
items produced during the first k inspections. Formula (4.5) says that k/ — F 
ask — o, Let t% < t < t& 4: . Then since k = number of items inspected in the 
first ¢ items produced, the inequalities 


< 


<-s 


k k k 
, os 


ths 

imply that lim,... k/i — F with probability 1. 
If g' = r/(r + 1), it can be shown more directly that F-' = © with prob- 
ability 1. If vo exists and is positive, it follows from the theory of recurrent 


Markov chains that g° < r/(r + 1). Thus since 0 < f < 1, we have from (4.2), 
(4.3), (4.5) and the last remark that 


when (f > 1 — 0), 
(4.6) 


otherwise. 


Hence since it can easily be shown that AOQ = p(1 — F), we have 


A OQ = (1 ae q) (+2) 1 — Yo : when (f >1- v0), 
(4.7) f Yo 
=1-—gq, otherwise. 


Now suppose it is true that the AOQ is an increasing function of g as long as 
f > 1 — 1. Then from (4.7) it would follow that 


(4.8) AOQL = 1—, 


where go is the value of q such that f = 1 — v9. From (3.7) with a = ¢°, it is 
easily established that 


ow fh 1s 
qo = (45 ’ 





SAMPLING PLANS 
so that 
_ gtti\ i 
(4.9) a ¢ J ) 


l— fA 
We now show that the AQQ is an increasing function of q as long as 


r+1\ l/s 
q< as (i.e., f > 1 — vp). 


‘ 


(; i ,) 40Q = (1 — g) ; ae 


Vo 


Then 


(4.10) SP = -v@ +a - go 2O. 
q 


It is necessary to show that the right-hand side of (4.10) is positive or 


(4.11) nel 8 es 


— 9 V@ 
i dq 
But, using (3.7) with a = 


1 


(4.12) dA ‘ (-3) i ___ 
q Vo nn ra ce 


Thus the left side of (4.11) becomes 
—( = g(r + D0 — w) "A — ¢/) — (1 — 00) 
ig*(1 — q) ! 


From (3.7) it follows that (1 — »)"*? = [(1 — 1) — gV/a — q'). Hence (4.13) 
becomes 


(4.14) —{ G—*) E S .. 64 »]. 
t\l—-q qg 


(4.13) 


But from (3.7), 


| =O ee 1 


Sm. (1 ee oy = 





402 C. DERMAN, 8S. LITTAUER AND H. SOLOMON 


Hence 


(1 — vr , , 
q' 
and the smallest value over the range f > 1 — vo which the bracket factor in 
(4.14) can take is minus one. Thus the largest value that (4.14) can reach is 


(4.15) (1 — ¢) (7). 
1 — q 7 


But 


1-q (2) Sg SF h-- SE. 


l—q \s 


7 
This proves (4.11). 
5. The MLP-T Plan. We consider first an infinite number of sampling levels. 
Let E;,, be as in the previous section. The transition probabilities are now 
P(E jm — Ej,m4i1) = 
(7 = 0,1,--: ; 
P(E 5,51 — Ej4i0) = 9 
P(E jm — Eo) = 1-— q (for all 7, m). 
Of course, 0 < g < lL. 
It can be shown in this case that 
Vin = py 
(7 = 0,1,--+ 5m 
and as before that 
gt > fH jm < l —- 
im 


io 
I 


Aog = La OG (= ) 


1 — qf f 


=l-gq 


It can easily be shown that AOQ is an increasing function of q for 0 < q' < f. 





SAMPLING PLANS 


Hence, 
AOQL = 1—f"". 

Now let the number of sampling levels, k, be finite. For this case we need only 
modify the function h(X,) such that 

h(X,) =f’ when Xn = Ejm Gj =k), 

=f when X, = Ein j>k), 
where here we persist with the notation #;,, as if the k = © plans are in effect. 
In similar fashion we have 
k—1 i—l1 x2 i—1 


Pap Lore + py Lr 


=) m=() y=k m=0 


ts k 
aa — gi) LHD + py. 
l1— qi/f 


For / 1, we have the Dodge Plan, and get the following result as in [3]: 


f 


1 ; 
roe f+q(l—f) 


shire ((l-f 28 >) 


In order to obtain AOQL contours for this situation, as for higher values of k, 
the use of digital computers would be expedient. 


REFERENCES 


{1} A. H. Bowker, “A survey of continuous sampling plans,’’ Proceedings of the Third- 
Berkeley Symposium on Mathematical Statistics and Probability, Vol. V, Uni 

versity of California Press, 1956, pp. 75-86. 

{2} C. Derman, “Some contributions to the theory of denumerable Markov chains,” Trans. 
Amer. Math. Soc, Vol. 79, No. 2 (1955), pp. 541-555. 

(3) H. F. Dona, “A sampling inspection plan for continuous production,’’ Ann. Math. 
Stat., Vol. 14 (1943), pp. 264-279. 

[4] J. L. Doon, Stochastic Processes, John Wiley and Sons, 1953. 

[5] W. Fevier, Probability Theory and Its Applications, John Wiley and Sons, 1950. 

[6] F. G. Fosrer, ‘“Markoff chains with an enumerable number of states and a class of 
cascade processes,’’ Proc. Cambridge Phil. Soc., Vol. 47 (1951), pp. 77-85. 

[7] M. A. Grrsuick, ‘‘A sequential inspection plan for quality control,’’ Technical Re- 
port No. 16, Applied Mathematics and Statistics Laboratory, Stanford Uni- 
versity, 1954. 

8] J. A. GreeNwoop, ‘“‘A continuous sampling plan and its operating characteristics,’’ 
Bureau of Ordnance, Navy Dept., Washington, D. C., unpublished memo- 
randum. 





404 C. DERMAN, 8S. LITTAUER AND H. SOLOMON 


9] G. LieBERMAN AND H. Sotomon, ‘‘Multi-level continuous sampling plans,’’ Ann. 
Math. Stat., Vol. (26), No. 4 (1955), pp. 686-704. 

{10} G. Resnrxorr, ‘‘Some modifications of the Lieberman-Solomon multi-level continu 
ous sampling plan, MLP,”’ Technical Report No. 26, Applied Mathematics and 
Statistics Laboratory, Stanford University, 1956. 

{1l] A. Wap anp J. WotrowrrTz, “Sampling inspection plans for continuous production 
which insure a prescribed limit on the outgoing quality,’’ Ann. Math. Stat., 
Vol. 16 (1945), pp. 30-49. 





ON SOME CHARACTERIZATION PROBLEMS CONNECTED 
WITH LINEAR STRUCTURAL RELATIONS 


By R. G. Lana 
Indian Statistical Institute, Calcutta 


1. Introduction. The problems concerned with the characterizations of the 
distribution laws of random variables when they are connected by a linear 
structural relation seem to originate from the stimulating problem first pro- 
posed by Ragnar Frisch before the Oxford Conference of the Econometric 
Society in 1936. His problem may be stated as follows. Let xz» and x; be two 
observable random variables connected by a linear structural set up, 


Xo = AE + mM, 
N= agti+n, 


where £, mo and m are mutually independent random variables, and a and a 
are some unknown constants. What are the conditions on the distribution laws 
of the random variables £, m and m under which the regression of zx» on x; and 
also that of z; on Zo is linear, irrespective of the values of the constants a) and a;? 

A partial solution to the problem of Ragnar Frisch was given by Allen [1] 
by proving that if the first two moments of m and all the moments of £ and m 
exist, then a necessary and sufficient condition for the regression of xz» on 2; 
to be linear irrespective of the values of the constants a» and a; is that both and 
m are normally distributed. A more general theorem was proved later independ- 
ently by Rao [10], [11] and Fix [4] as a complete solution to the problem of Ragnar 
Frisch. Rao-Fix’s theorem may be stated as follows: Let £, m0 and m be three 
mutually independent proper random variables each having a finite expectation. 
Then a necessary and sufficient condition for the regression of x9 = act + 10 
on x; = a& + m to be linear for some a ~ 0 and for all a; contained in a 
closed interval is that both £ and m; should belong to a class of stable laws with 
finite expectation. 

tecently the author [6] has obtained a generalization of Rao-Fix’s theorem 
in a new direction, replacing the condition of stochastic independence of 
and m by the weaker assumption that the regression of mo on 7 is linear. The 
author [5] has also obtained a characterization of the normal law from the con- 
sequence of the linearity of multiple regression of one random variable on several 
others, when the variables are connected by a linear structural relation as in 
the case of the bifactor theory of Spearman. Several analogous characterization 
problems connected with linear structural relations have also been solved re- 
cently by Ferguson [3]. In the present paper we shall consider some generaliza- 
tions of these problems in various directions. In Section 4, some theorems on 


Received June 14, 1956. 





406 R. G. LAHA 


the dependent error variables (Theorems 4.1 and 4.2) are deduced and a general 
theorem on a higher dimensional linear structure is proved in the subsequent 
section (Theorem 5.1). 

2. Definitions and assumptions. Throughout the present investigation we 
shall confine our attention to a set of observable random variables xo , 71, --- 


x, having the following linear structural set up: 


r= anks T Arte tee t+ opt p + No , 
e i I 
Qué + Gk + + Qiptp +m, 


(2.1) 


in = Gniti + Onaks + °°° + Gnptp + Mn; 
where & , &, --- , &» are usually called the latent or hypothetical variables and 
no, m1, °°*, the error variables of the linear structural relation, and a;;’s 
are a certain fixed set of constants. 
Now using the notations of vectors and matrices, the equation (2.1) may be 
rewritten as 


, / 


(2.2) (aoix) = E(ag? A’) + (moi n) 


where 


x= (%1,2% (ieee Xn), -_= (&) J & se 
i at .. ) 
ao = (dor , Gor , *** 5 Aop), 7 = (Mm, 925 °°* » Ma), 


and A = (aj;)jn1,2,---n;ja1.2,---p and ap and A’ denote respectively the transposes 
of a and A. 

We shall now make the following assumptions on the distributions of the ran- 
dom variables concerned. 

AssumMPTION 1. The conditional distribution of zo for fixed 2, x2, --- , 2, 
is assumed to exist, wherever needed. 

AssumpTION 2. The set of random variables £ , f&,--- , &» is distributed in- 
dependently of the set of error variables no, m, °°: , Mm - 

AssuMPTION 3. The latent variables & , &,--- , &» are mutually independent 
proper random variables each having a finite expectation (which is assumed to 
be zero without any loss of generality in the proof) as well as a finite variance 
o;,j = 1,2, ---, p. Similarly the error variables m, m,--- , m™ are mutually 
independent (not necessarily proper) random variables each having a finite 
expectation (which is also assumed to be zero) as well asa finite variance 6, 
k = 0,1, ---,n. Let the dispersion matrices of the random vectors — and 7 
be denoted by = and A respectively such that 


9 


a} 





CHARACTERIZATION PROBLEMS 407 


Now it should be noted that some or all of the elements of the matrix A may 
be zero. 

But in Section 4, where some results concerned with dependent error variables 
are obtained for the special case of the above structure with p = | and n 2 2, 
it is assumed that all the random variables are proper and have only finite expec- 
tations and further the multiple regression of 4 on m, m2, °-* , » is linear. 

The role of these assumptions is to ensure the existence of the expectation and 
the variance of the conditional distribution of x» for fixed x, , x. , --- , x, which 
we denote by E(xo! 2, t2, +--+ , 2a) and V(x | 2, t2,--- , tn) respectively. 


3. Some lemmas. We give below some lemmas which are useful in proving 
the theorems in the subsequent sections. 

Lemma 3.1. Let 2% ,2%1,--- , X, be a set of n + 1 proper random variables each 
having a finite expectation (which is assumed to be zero without any loss of general- 
ity) as well as a finite variance. Then the necessary and sufficient conditions for 


(3 1) (B(x | a, Ta, *** ta) @ Bir + Bors Tr 1s ob 8 was 
~. V (te | 241,22, °°°,4n) = o ae. 


are that the equations 


Bp(to, th, +++, ta) — Fg, ests +t 
Oto to=0 j=l at 


Feo(to, th, «++, tn) 2 = Fe(O,t,-°-, 
ey _— (O,t:, -°°, tia) + 8; B, : 
ats, t o=6 — , z ot; dt, 


are to be satisfied for all real t; , tz, --+ , tn whereg(to,t,,--+,tn) andg(0O,t,,--+ , tn) 
represent respectively the characteristic functions of the distributions of (x, 
U1, °**, tn) and (a, Xe2,-+-*, Ln) and further 8B, Bo, --- , Bn and o > 0 are 
arbitrary constants. 

When the random variables xo, «, --- , Z, satisfy the relations in (3.1), we 
say that the multiple regression of 2» on 2, 2, *** , 2» is linear and that the 
conditional distribution of 29 for fixed 2, 22, --+- , 2, is homoscedastic. 


Proor: To prove that the conditions are necessary, we can easily verify that 


dep to, t ’ -+*) th) 7 fp. 7 7 ~ } 
~~ tii) =F \iE (xo %1, U2, ***, Xn) EXP (: t t, x,) 
0 o=l \ j=l 


=> B; By wx; exp (: 


j=l 


n 


OP(O, ti, -- +, tn) 
> 4 — : 2: ae 


7=1 ot; 





408 R. G. LAHA 


Similarly 


Fo(to, tr, ee tas ( lien n 
rT re | —E< E(xo| 21, 22, «++, Xn) exp (i 2 tas)p 


at; E j=l 
( / n n 7 ) 
—E< (oi a“ » B; Bx 2,2.) exp (i > t x) 

\ i j=l 


d'e(0, ti, -*-, tr) 
Ot; Ot, , 


—aio(0, th, «++, te) + D> BiB 
7 k=1 


To prove the sufficiency of the conditions, we note simply that (3.2) may be 
rewritten as 


\ j=l 


j=l 


and 


n n 
E {ete | 21,22, °°*,2n) — 09 — a 8; By X;2,> exp (i - t; 2; )| = (). 
) k=1 ) j=l 
Then from the uniqueness theorem of Fourier transforms of functions of 
bounded variation, (3.1) follows immediately. 
For the special case of n = 1, this reduces to the lemma proved independently 
by Rao [9] and Rothschild and Mourier [12]. 
LemMa 3.2. Let 29, 41, +++ , tn be a set of n + 1 proper random variables each 
having a finite expectation (which is assumed to be zero). Then the necessary and 


sufficient condition for 
E (29 Yi, Tea, °** » Fn) = PM T Bote tee’ + Bu.%. a.e. 


is that the equation 


O¢(to, ti, °°*, tn) - O¢(0,t1, +++ , tn) 
oto d kT RO g, <= bn 
Oto to=0 2, 7 Ot; 


is to be satisfied for all real values of t,, t2, +++, th. 
This lemma has been already proved independently by the author [5] and 


Ferguson [3]. 
LemMA 3.3. Let 2, %2,°*:* , tn be n independent proper random variables and 
let further ¢;(t) denote the characteristic function of the distribution of x; ,j = 1, 2, 


n. If now the functions ¢;(t) satisfy the equation 


for all real t in a certain neighbourhood of the origin | t| < 6(6 > 0), where a;’s 
are some positive numbers and Q(t) a quadratic polynomial in t, then each x; follows 


normal distribution. 


1The proof of this lemma is given in A. A. Zinger and Yu. V. Linnik (13). 





CHARACTERIZATION PROBLEMS 409 


This lemma may be regarded as an analytical extension of Cramér’s theorem 
on the normal law and has been proved by Linnik [8]. The proof of this lemma 
has been given by the author in [7]. 


4. Some results for the case of dependent error variables. We shall now ob- 
tain some results connected with dependent error variables for the special case 
of the above linear structure when p = 1 and n 2 2. 

THEOREM 4.1. Let the observable random variables x;(j = 0, 1, --- , n) have the 
linear structural set-up x; = a; + ; where the a,;’s are fixed nonzero constants 
and further — and mo, m,°** » %n are proper random variables each having a finite 
expectation (which is assumed to be zero without any loss of generality) such that 

(i) & is distributed independently of (no, m,--*, mn) 

(ii) E(mo|m, m,-**, mm) = Doin Bins, the B;’s being a set of constants, 
then the multiple regression of X» on 2, %2,°** , Ln 18 always linear, whenever the 
relation ao = >.) aj8;, is satisfied. 

Proor. Let ¢(to, t:, --+ , tn); Golo, 4, --- , tn) and ®(t) represent the charac- 
teristic functions of the distributions of (x , 21, --+- , Zn); (mo, m,°** , Mm) and 
£ respectively. 

Then we can write 


o(to, tr, ++ ts) = Elexp (1) 0 t,2,)] 
(4.1) 
= O(> m0 asts)eolto, i, --~ , tr). 


Again since it is given that E(mo|m, m2, °°: , ) = >,mi 83, by applying 
Lemma 3.2, we get easily 


Aeo(to, th 2b) ] =“. Ogo(O, th, «++, tn) 
2) ~pcalergadate = ) eee « 
(4.2) ~ 8 


3 


Oly j7=1 at; 


Now differentiating both sides of the equation (4.1) with respect to & and then 
putting % = 0 and finally using the equation (4.2), we have 


deli ft) a (& uatemeet oe 
Oto tomd 7=1 J] 


(4.3) 
“Oy ~ Ogo(0, th, ++, tn 
+ > @ (2 ats) 900, tr, ~*~» be) : ) 


= = at; 


Again putting & = 0 on both sides of the equation (4.1) and then differentiat- 
ing both sides with respect to t;,j = 1, 2, --- n we get 


O¢(0, th, +++, tn) oo a,®' (> a,ts) eal0, ee 
j=1 


ot; 
(4.4) : 
, w (5 ajs,) Sen tan "ste jee 
j=1 Ot; 


Now it is given that a = >—%, a,8;; hence substituting this value of a» in 





410 R. G. LAHA 


(4.3) and finally comparing with (4.4), it is easy to obtain 


(4.5) 2eltosnt)| = 37 pi 2200, ty +5 te) 
Oto tom med ot; 

Then the proof follows at once using Lemma 3.2 to the equation (4.5). 

THEOREM 4.2. With the same notations and assumptions as used in Theorem 4.1 
together with the additional assumptions 

(iii) the variables m, ™,°-*:, m, are mutually independent. 

(iv) the constants a; satisfy the relation a + = a;8;,, 
the necessary and sufficient condition for the multiple regression of x» on 2%, 22, 


- , Xn to be linear (n = 2) is that & and each of m, m2, +++, mn 78 normally 
distributed. 


PRoor. 


Necessity. Let us suppose that E(xo| 2, t2,---, %n) = >> Je18jv;. Then 
using Lemma 3.2, we have 


> \ Og(to, tr, «++, tn) ~ 0¢(0, th, - ++, tr) 
4.6) Cole, hs --*sG)) = _, Fv Shs *** 
(46 Oto i) 2, 6; at; 

Next using the equations (4.3), (4.4) and (4.6) together and noting that m, 
m2, °**» Mm are mutually independent random variables, we get after a little 
algebraic simplification, 


(do — Di a8 JB’ (Deja ats) | [j=1 ¢i(t;) 


= Domi (8; — Bi) P(e ast ej (ts) - [ages ee (te), 


where ¢j(t;) represents the characteristic function of the distribution of 7, ; 
g=1,2,---,nN. 

It can be easily shown that under the conditions of the theorem, neither 
a — > f-1 4,8; nor any of B; — 8; j = 1, 2, --- n in the equation (4.7) can be 
equal to zero. Putting & = 0 for all k + 7 in (4.7) and noting that ¢5(ts)) emo =0 
for7 = 1, 2,--- , n we get 


(4.7) 


a, 


(4.8) (a — > 1 a;8;)®' (ajtse(t;) = (8; — Bj) Plast) y,(t;), 7 = 1,2, ---,n. 


Let us now suppose that 8; — 8; = 0 for some j, but a) — >>7-, a8; = 0. 
In this case the equation (4.8) gives 


(4.9) ®'(apts)o;(t;) = 0. 


But since the characteristic function ¢(¢) is continuous for all real ¢ and equal 
to unity at the origin, in a suitably chosen neighbourhood of the origin, we have 
always ¢,(t;) + 0. Thus it follows that for all ¢; in that neighbourhood of the 
origin ®’(a;t;) = 0, leading to the conclusion that the distribution of £ is im- 
proper. 

Proceeding in the same way it can be shown that if 8; — 8; ~ 0 for any j, 
whereas ad — nt a;8; = 0, the distribution of the corresponding 7; is im- 





CHARACTERIZATION PROBLEMS 411 


proper. Thus both these cases contradict the conditions of the theorem. Now 
the only alternative left is when a9 — >-%, a,8; and each of 8; — 6ij = 1, 2, 
- , n vanish simultaneously. But in this case we have ao = > 71 a;8; , which 
is also contrary to the conditions of the theorem. 
Now restricting the values of t;, 2, --- , t, to a suitably chosen neighbour- 
hood of the origin such that each of the factors occurring in the product 


(D071 ats) [ [=i ¢i(t,) 


is different from zero, we divide both sides of the equation (4.7) by this ex- 
pression and thus obtain, 


(4.10) (do — Dojmr 2;85)0'( Dojar agt;) = Dojar (8; — B;)0;(ts) 
where 
6(t) = In P(t) and 6,(t) = Ing,t), 7 = 1,2, ---,n. 
Next putting 4; = 4 = --- = ¢, = 0 in (4.10), we get 
(4.11) (ao — Do for aj8;) 0’ (arts + ete) 
= (8: — 81)0i(h) + (82 — 82)02(4). 
Then putting successively 4; = 0 and tf = 0 in (4.11) and noting that 


a — La a8; = 0, 


we get easily 


(4.12) 8’ (dit, + Gate) = 6’ (aiti) + 6’ (dele). 


But 6’(t) being continuous in ¢, it at once follows from the equation (4.12) 
that 6’(t) is a linear function of ¢ and hence @(¢) is a quadratic polynomial in ¢, 
thus establishing the normality of the variable ¢. Then the normality of the re- 
maining variables 7; ;j7 = 1, 2, --- , n follows simply from the equation (4.8). 

Sufficiency. Let o* denote the variance of the random variable £ and 6; that 
for njj = 0,1,2,---,n. 

Under the conditions of the theorem, we get on using the equation (4.3) 

dp(to, ti, eb) 
to to=d 


0 
(4.13) 


= -|= (aga;o" + 8; aD b (> a; ) II o4(t) 


where 


—s2¢2/2 
37 > 
, 


and g,(t;) = e 


Similarly we get, on using the equation (4.4) 





412 R. G. LAHA 
(4.14) G¢(0, hy +++, te) = -[a (= ay t) o + 5; | bd (= aj t) II ¢{t;), 
at; i j=l j=1 
j oe 1, 2, da Xe 
Thus using (4.13) and (4.14) together, we may write 


. Oe(to, ti, «++, tn) = d0(0, th, --+, tn) 
4.1! ete cement “pw ~~. Bee 
448) ale lo. Bi at; 


where the constants 8; are to be determined from the system of equations 
Bi(ayajo") + «++ + B;(ajo° + 8;) + +++ + Bn(anajo’) 
- ya ja" + 88; ? j 


The proof follows at once using Lemma 3.2 to the equation (4.15). 

The following corollary can be easily deduced. 

Corouuary 4.1. Let the observable random variables x;(j7 = 0, 1, --- , n) have 
the linear structural set up x; = aj + 1; , where the a;’s are a set of non-zero con- 
stants and the & and n;’s are mutually independent proper random variables each 
having a finite expectation. Then the necessary and sufficient condition for the 
multiple regression of to On 2%, X2,°** , tn to be linear (when n = 2) is that & 
and m, 12, °** » % are normally distributed. 

This corollary has been proved earlier independently by the author [5] and 
Ferguson [3]. 


5. A theorem in general linear structure. We shall now consider a theorem 
on characterisation connected with the general linear structural set up already 
defined by the equation (2.1) in Section 2. In this direction, Ferguson [3] has 
obtained some necessary and sufficient conditions for the multiple regression 
of 2 On 2, X2,*** , 2n to be linear irrespective of the values of the constants 
a;; . In the case of the higher dimensional structure, no result has yet been ob- 
tained, assuming the regression to be linear for just one set of values of the 
constants a;;. We shall now show that it is possible to obtain some result for 
the case of the general linear structural relation for only one set of the values 
of the constants a;; (with some restrictions upon their values) under the addi- 
tional assumption that the conditional distribution of x given 2, %2,--:, 
Zn is homoscedastic and all the random variables concerned have finite vari- 
ances. 

We are now in a position to prove the following theorem: 

THEOREM 5.1. In the general linear structural set wp (2.1) and under the As- 
sumptions 1, 2 and 3, if the constants a;;’s are subject to the following restrictions 

(i) the vector a; = (@1;, Ge;,°** , Gnj) has at least one non-zero element for 
eachj = 1,2,---,D, 
(ii) the matrix (AZA’ + A) is non-singular, that is the determinant 


|ASA’ + Al ~ O, 





CHARACTERIZATION PROBLEMS 


(iii) each of the elements of the vectors aZA'(AZA’ + A)” and 
all — ZA'(AXA’ + A)’ A] 


is different from zero, 
then the necessary and sufficient condition for 


(E(xola1 9 Tay °°", Zn) = Bit; + Bote +--+ + Batu, 


lez | ‘ 2 
\V (xola1 , 22, °°* > Zn) = OO, 


is that each of &, &,°-:+, &» and each of the proper random variables amongst 
™m,2,°** , Mm are normally distributed. 

PRoor. 

Necessity. Let o(lo , t: , «~~ , tn) denote the characteristic function of the distri- 
bution of (zo , 21, --* , 2n);®,(t) that for the distribution of §;(7 = 1, 2, --- , p) 
and ¢;(t) that for the distribution of m(k = 0, 1, 2, --- ,n). 

Then it is easy to obtain 
(5.1 o(to,t, +++, ts) = Elexp (i > reo tere) | 


= II fat (> no arts) | [emo ox(te). 


Now under the assumptions of the theorem and applying the equation (3.2) 
in Lemma 3.1 to (5.1) above, we get after some laborious algebraic computa- 
tions, proceeding in the same way as in Section 4, 


5.2) es (ao; > Bxax;)0;(> rar Qy jte) = > den BxO4(te), 
ps (as; — (Soi Ban ;)"}O; (Dots ay jte) = —(05 — &) + > Bir (te), 


holding for all real 4), &,---, t, in a suitably chosen neighbourhood of the 
origin, where 


6,(t) = In®,(2), 
6,(t) In ¢(t), 


Under the assumption that each of the random variables concerned has a 
finite second moment, we may again differentiate both sides of the equation 
(5.2) with respect to t,(l = 1, 2, --- , m) and thus obtain the set of equations 


(54) SOP aij(ao; — Dokes Beaes)O7 (Doda aes) = BiOr (tt), 1 = 1,2,---,n. 


Next multiplying both sides of the equation (5.4) by 8; and adding for all 
l 1,2, --- , n, we get 


(5.5) Dia Dnt 6101;(do; — > 1 Braz ;)0; (Doras Ay jx) = Din 816; (t1). 


Now using the equation (5.3), we get a simplification of the following ex- 
pression 





414 R. G. LAHA 


(5.6) 571 (ao; — Dober BuGes)"OF (Doar aejte) + Dobe BiO% (tr) 
= >i {ao; = 2a0;(> ia Bax 5) + (Qo Buns) "0; (Dotan nae 
+ Soh Bik (te) 


(ge -- 50) 


a 2p (does BxQxj) (doi — ae Bxaxj)}OF (doen Ay ite) 
+ 2> hi BiOe (tr). 
Finally using the equation (5.5) to the right-hand side of (5.6), we obtain 
(5.7) DoP1 (ao; — Dotan Bxates)°O; (Detar Geite) + Dotan Bex (te) = —(o0 — 49). 
Since it is given that the matrix (AZA’ + A) is non-singular, it can be easily 
shown under the condition E(xo|x; , t2, +++ , fn) = Bit: + Bote + «++ + Bata, 
8, is given by the kth element of the vector aaZA’(AZA’ + A)” for k = 1, 
2,--+,n. Similarly ao; — > 2-1 :ae; is given by the jth element of the vector 
all — SA'(AZA’ + A)“A] for 7 = 1,2,---, p. 
Thus under the given restrictions on a,;;’s, it follows that ag; — > es B.a,; ~ O 
for allj = 1, 2,---, p and similarly 6, ~ 0 for all k = 1, 2, --- , n. Then the 
J >“) I J , 
proof of the necessity part follows at once, using Linnik’s result (Lemma 3.3) 
to the equation (5.7). 
The proof that the condition is sufficient follows easily from Cramér ((2], 
pp. 314-315). 
REFERENCES 
{1] H. V. Avuen, “‘A theorem concerning the linearity of regression,’’ Stat. Res. Memoirs., 
Vol. 2 (1938), pp. 60-68. 
[2] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946. 
[3] T. Fereuson, “On the existence of linear regression in linear structural relations,’’ 
University of California Publications in Statistics, Vol. 2 (1955), pp. 143-166. 
[4] E. Frx, “Distributions which lead to linear regressions,’’ Proceedings of the First 
Berkeley Symposium on Mathematical Statistics and Probability, University of 
California Press, 1949, pp. 79-91 
[5] R. G. Lana, “On Characterizations of Probability Distributions and Statistics,” 
Ph.D. thesis submitted to Calcutta University, 1955. 
[6] R. G. Lana, “On a characterization of stable law with finite expectation,’’ Ann. 
Math. Stat., Vol. 27 (1956), pp. 187-195. 
[7] R. G. Lana, ‘On a characterization of the normal! distribution from properties of 
suitable linear statistics,’’ Ann. Math. Stat., Vol. 28 (1957), pp. 126-139. 
[8] Yu, V. Linnik, “On an analytical extension of Cramer’s theorem on the normal law,’’ 
Reports of the Leningrad and Moscow University Probability Seminars, 1954. 
[9] C. R. Rao, M.A. thesis submitted to Calcutta University, 1943. 
[10] C. R. Rao, ‘‘Note on a problem of Ragnar Frisch,’’ Econometrica, Vol. 15 (1947), pp. 
245-249. 
11] C. R. Rao, ‘‘A correction to ‘Note on a problem of Ragnar Frisch,’ ’’ Econometrica, 
Vol. 17 (1949), p. 212. 
[12] C. Roruscuitp anp E. Movurrer, ‘‘Sur les lois de probabilité & regression linéaire et 
écart type lié constant,’’ C. R. Acad Sci, Paris, Vol. 225 (1947), pp. 1117-1119. 
[13] A. A. ZiInceR AND Yv. U. Linnix, “On an analytical extension of a theorem of Cramér 
and its applications (Russian)”, Vesitnik Leningrad Univ., Vol. 10 (1955), pp. 51-56. 





RANDOM ORTHOGONAL TRANSFORMATIONS AND THEIR USE IN 
SOME CLASSICAL DISTRIBUTION PROBLEMS IN 
MULTIVARIATE ANALYSIS! 


By Ropert A. W1ssMAN 
Department of Statistics, University of California, Berkeley 


0. Summary. Orthogonal matrices having elements depending on certain 
random vectors provide a useful tool in various distribution problems in multi- 
variate analysis. The method is applied to the derivation of the distributions of 
Hotelling’s T? and Wilks’ generalized variance, the Bartlett decomposition, and 
the Wishart distribution. 


1. Introduction. The purpose of this paper is to demonstrate a method for 
treating some distribution problems in multivariate normal analysis, and to 
apply this method to the derivation of the Wishart distribution, the Bartlett 
decomposition, and the distributions of Hotelling’s T? and Wilks’ generalized 
variance. A large number of different derivations of these statistics exist in the 
literature ((1], [2], [4], [6] to [23]), and which one is preferable isa matter of taste. 
The motivation for presenting yet another derivation of well-known results is 
that it is believed that the method presented here leads to the results faster 
than existing derivations, without the necessity of extensive preparation, and 
almost without computations. A further advantege of the method is that it 
leads immediately to a representation of the statistics mentioned in terms of 
combinations of independent normal variables. More specifically, apart from 
constant factors, Hotelling’s T? is obtained as an F variable, Wilks’ generalized 
variance as a product of independent x? variables, while the Wishart distribu- 
tion is simply related to the joint distribution of independent normal and x? 
variables (Bartlett decomposition [3]). These are known facts, stated explicitly 
by some authors ({2], [6], [16], [18]), but clearly demonstrated by only few deriva- 
tions in the literature: Elfving [6] and Ogawa [16] obtain the Bartlett decomposi- 
tion; Anderson [2] and Elfving [6] obtain the generalized variance essentially 
as a product of x? variables; while Anderson [2] obtains 7° essentially as an F 
variable in a rather indirect way, by relating it to a multiple correlation coeffi- 
cient. The method presented in this paper will lead to the results in a simple 
direct, and unified way. 

It can be expected that orthogonal transformations provide at least as power- 
ful a tool in the multivariate case as in the univariate case. Indeed, an example 
can be found in the work of James [12]. The method followed in the present 
paper will also lean very heavily on orthogonal transformations of random vari- 
ables. In order to utilize this tool to the utmost, most of the useful transforma- 


Received June 18, 1956 
' This investigation was supported (in part) by a research grant (RG-3666) from the 
Institutes of Health, U. S. Public Health Service 


415 





416 ROBERT A. WIJSMAN 


tions will be performed with orthogonal matrices, the elements of which depend 
on a random vector. This idea is not new, but is often couched in geometrical 
language.? In this respect the treatment in this paper will have something in 
common with that of Elfving [6] and Ogawa [16]. However, the method followed 
here does not seem to have appeared in the literature in the same form. 


2. Methods and main results. Notation: Boldface symbols denote matrices 
and column vectors, prime denotes transposition, 0 is a zero vector, I, ann X n 
- identity matrix, Q an orthogonal matrix. If A is a square matrix, then | A | 
denotes the absolute value of its determinant, and tr A its trace. 2(u, &) denotes 
the distribution of a normal random vector with mean y and covariance matrix 
x, (0, 1) the distribution of a normal variable with zero mean and unit vari- 
ance. A x’ variable with n degrees of freedom is denoted by x*,, an F variable 
with n; and nz degrees of freedom by Fn,,.,, . Of ak X n matrix of variables u;, , 
the i-th row is denoted by U; , the r-th column by U,,, . 

The dominant method used throughout this paper is transformation by an 
orthogonal matrix, the elements of which depend on a random vector. The 
usefulness of this method depends on the following lemma. 

Lemma 1. Let X be a random vector with components x --- x, , ond let Q(Z) 
be a random n X n orthogonal matrix whose elements depend in a measurable way 
on a random vector Z which is independent of X. Let Y = QX; then if X is N(0, I,), 
Y is also (0, I,,) and independent of Z. 

The proof of Lemma 1 follows immediately from the fact that the conditional 
distribution of Y, given Z, is 91(0, I) and is therefore independent of Z. 

The lemma will usually be applied in cases where X and Z have the same 
number of components, and Q(Z) is defined in such a way that QZ has all but 
its last component equal to zero. In Appendix 1 it will be shown that Q can be 
uniquely defined in a measurable way. 

Throughout this paper we have to consider random matrices, the elements 
of which are independent 91(0, 1) variables. A k X n matrix of independent 
(0, 1) variables xz; (¢ = 1---k,r 1 --- mn), will be denoted by Mi, . We 
shall assume k < n. The i-th row of Mi, will be denoted by X; (i = 1 --- k). 
With Mz, we form the symmetric matrix Aj, 


(1) Ain = Mi, (Min)’, 
whose j-th element is X;X; (i, 7 = 1--- k). 
Consider the transformation 


X(Q = Z; (¢=1---k), 


in which the orthogonal matrix Q depends on X; in such a way as to reduce the 
first n — 1 components of Z,; to 0. The last component of any Z;—that is, z;,— 


2 In a course at Stanford University, Dr. Charles M. Stein uses this idea in the deriva- 
tion of Hotelling’s T? distribution (private communication). 





RANDOM ORTHOGONAL TRANSFORMATIONS 417 


will, in the following, be denoted for short by z; (¢ = 1 --- k). For 2 we have 
(2) zi = XX, 


and z} is clearly a x%, variable. Inserting the identity matrix I, = QQ’ between 
the two factors on the right-hand side of (1) we obtain 


Zita? Re 
ize ZeZy ++ Z5Z 
(3) Ain = Mi, QQ'(Mi,)’ = , 
21ze ZeZe «++ Zpde | 
Let Y; (¢ = 1--- k — 1) be the (n — 1)-component vector obtained from Z;,,; 
by deleting its last component z;,;: 
Yer = 2419 (G@=1---k—I1,r ---n— 1), 
so that we have 
(4) KigiXsar = ZinrZiur = YY + 2nesns (i,j -++k— 1). 
It is now possible to write (3) in the following way: 


| 
| Hj1 oO 
\| 22 2 

; | 0 NY; 
(5) Ain as ] ; : 


© WasYs --- Wo-aWa-s 
Zk 1 l 


in which 2 is x, , 22 °-: 2 and the y; (¢ = 1---k —1l,r =1---n — 1) are 
9(0, 1), and all variables are independent. Equation (5) can be written more 
concisely if we denote by Z the vector with components 2 --~- ze : 

21 0’ | 1 0’ 4 Z’ 


(6) Ain = | . |" 
1Z Teal ]O Abanalf lO Tex] 


Almost everything will follow from (5) or (6), which is essentially the first step 
in the Bartlett decomposition. Taking determinants in (6) we get 


(7) | Aza | = 21 | Akan], 


in which the two factors on the right-hand side are independent, and z} has a 

x, distribution. Upon repeated application of (7) we get the following result: 
Lemma 2. The distribution of | Aj, | is the distribution of the product of k inde- 

pendent x’ variables with n,n — 1, --- ,n — k + 1 degrees of freedom, respectively. 





418 ROBERT A. WIJSMAN 
Equation (7) holds for any k. Writing (7) with k replaced by k — h (hk S k — 1) 
and forming ratios, we have 


x 
_| Aen | : b—1,n—1 
z 
| Akan | | AL h—1,n—1 


(8) a i 


in which we set by convention AG, = 1. Upon iteration of (8) and use of Lemma 2 
we get the following result. 

Lemma 3. The ratio! Ain| | Ai-a..| ', for h S k — 1, is distributed like the 
product of h independent x’ variables withn —k +h,n —k +h l 
n — k + 1 degrees of freedom, respectively. 

Coro.uary. The ratio | Ain | | At-a.n | is @ xn—e4. variable. 

The results so far obtained are sufficient to derive the distributions of T* and 
the generalized variance. For the Wishart distribution, however, it is necessary 
to consider the joint distribution of the }4(k + 1) distinct elements a;; = X‘X; 
(i = 1---k,j = @-+-+k) of Aj, . The decomposition (5) expresses the a,; as 
functions of the x°, variable 2} , the k — 1 9(0, 1) variables z, --- z,, and the 
Lk(k — 1) variables b;; = Y;¥;(i = 1---k —1,j =i---k — 1). If the decom- 
position (5) is continued, then the a,; are expressed as functions of 3k(k — 1) 
97(0, 1) variables and k x’ variables with n, n — 1, ---,n — k + 1 degrees of 
freedom, respectively, all variables being independent.’ The joint distribution 
of these variables, together with the Jacobian of the transformation, will pro- 
duce the joint distribution of the a;;. 

According to (3), the a;; are first expressed as functions of new variables 
zi, z;, and Z,Z; (i = 2---k,j = i--- k). Subsequently a new set of variables 
bs = YY; (@@ =1---k-1,j i--+k — 1) is introduced, connected with 
the Z;Z; through (4). The first transformation yields a Jacobian 2}, the second 
yields unity. Hence, the Jacobian of the transformation from the }k(k + 1) 
variables a;; (7 1 ---k,j =2---k) tothe 3k(k + 1) variables zj, z.--- ze, 
and b;; (i 1---& - ] i---k — 1) is zi *. Let the density of the 
Qi; (% L---R 9 - k) od denoted by p(Aj,.), and the density of the b,; 
@=1---k-—1,j i -++ k — 1) by p(AZ_1,.n-1). The joint density of the 91(0, 1) 
variables 22 --- z , the x, variable 2; , and the variables b,; is given by 


k / 
9 .\— (1/2) (k—-1) 2) 59—U/2)—n/2,,—1 [ 7 —2 f 2) v 
(27) ” @xp | -3 a | se (5) 2p exp |—4zi|p(Az-1,n-1). 


i=2 


_ sae. 


“~ 


m ° ° bd « 
Taking the Jacobian z{~ into account, we have: 


(9) p(Ain) = Cin2i exp EG +> ‘| p(Ak1,n—1), 


i=2 
? If the decomposition (5) is continued, the right hand side can be written as the product 
of a triangular matrix and its transposed, the elements in the triangular matrix being inde 


pendent normal and x? variables. The decomposition in that form was also obtained by 
Mauldon [15]. 





RANDOM ORTHOGONAL TRANSFORMATIONS 


with 
(10) Cie ae (Qr)*-Y"Q"?p (3). 


In order to write (9) so that it can be iterated immediately, we observe first 
that zi = | Ai, || Als.1|~ by (7). Furthermore, 


k k k—1 
e+ > 2= DXi — D ViY; = tr Af, — tr Abia, 
i=2 i=1 i=1 
using (2) and (4). Thus we can write (9) in the following way: 
p(Ain) | Ain ["*-”” exp [3 tr Ai] 


= Ckn p(Ai-1,n—1) Aba n—1 eee exp [3 tr Ai-1,n-1, 
from which follows immediately by iteration 


k—1 
p(Ain) | Ain pore exp [3 tr Ain] = I Cr~in—i = Crs; 


with C,, given by 


(11) Cr - op /2) (en) 14) e—1) ll r ( > ‘). 


=0 
We have then, finally, 


(12) p(Ain) = Cen | Ain | ‘°° *"?exp[—3trAin], 
with Cy, given by (11). 


3. Applications. Let U..) --- Ucn) be m independent observations on a k-com- 
ponent random vector U, which is (uu, £), with k < n — 1. The components 
of U,,) will be denoted by u; (¢ = 1---k, r = 1--- mn). The sample mean is 
U = (1/n) 3°, Ug), having components a --- a. The sample covariance 
matrix S has components s,;; (i, 7 = 1 ---+ k) given by 


(13) 85 = Eye - (wir _ U;) (uj, -_ ii,). 


n—-Il1lri 


(a) Hotelling’s T°. Hotelling’s T° is defined as 
(14) T* = n( O— w)’'S (0 — w), 


in which wo is some specified vector. T°’ is not defined on the null set in the 
sample space on which § is singular. 

First consider the case » = wo. By making the proper transformations it 
can be shown‘ that [1/(n — 1)]T* has the same distribution as Tj defined by 


* See, for example [8]. In order to make this paper self-contained, a proof is given in 
Appendix 2. 





420 : ROBERT A. WIJSMAN 


(15) Ti = Xinp(ASna) Xm ; 


in which Xq) --- Xn) are independent, 91(0, I,), and Ai,,-: is defined by (1). 
In (15) all the variables are independent, and 91(0, 1). By subjecting the X;,) 
to an orthogonal transformation with matrix Q:Y,,) = QX,,) (r = 1 --- n), we 
can write (15) also as 


(16) Ti = Yin) (A801) Yeu; 


with Ay .1 = Min: (Mins) and Mi.,-; = QMi,,.,. The columns of Mi..-1 
are Xi) --- Xin—1) . Hence, if Q depends only on X,,) , then, by Lemma 1, the 
elements of Mi,,-; are still independent, 9(0, 1), and independent of Yi) . 
We now choose @ such that the first k — 1 components of Y,,) are 0. The k-th 
component of Y,,) will be denoted by y. From (16) it follows that Tj equals 
the product of y” and the kk-th element of (Ai,,:)', where it has to be remem- 
bered that these factors are independent. Now y” Xin) Xm) iS a x: variable, 
and the kk-th element of (Aj.n-1)’ equals | A¥_;,.-: | | Ai... |’, the reciprocal 
of which is a xi_, variable by the corollary to Lemma 3. Hence 7} is the ratio 
of two independent x’ variables, with k and (n — k) degrees of freedom, respec- 
tively. It follows that (n — 1) "k(n — k)T* is an F, ,-» variable. 

If wp ~ wo, then, in (15), X;,) no longer has zero mean, with the consequence 
that y? is a noncentral x; variable. On the other hand, the distribution of AZ..-1 
is unchanged. It follows then that (n — 1) "k(n — k)T’ is a noncentral Fy...» 
variable. Its distribution was first derived by Hsu [9]. 

(b) Wilks’ generalized variance. Wilks’ generalized variance is defined as 
|S | , the determinant of the sample covariance matrix given by (13). By making 
the same transformation which led to (15) (see also Appendix 2), we find that 


(17) (n — 1) CSC’ = Aj ni, 
in which C is a nonsingular matrix transforming = to the identity matrix 
(18) CzrC’ = |, . 


Taking determinants in (17) and (18) and using Lemma 2, we have then imme- 
diately the result that (n — 1) | &|~*|S| is distributed like the product of k 
independent x” variables with n — 1, --- ,n — k degrees of freedom, respectively. 
The density of this distribution can be obtained easily only for k = 1 and k = 2. 
For k = 3, expressions in terms of infinite series have been given by Kullback [13]. 

(c) The Wishart distribution and Bartlett decomposition. The Wishart distribu- 
tion is the joint distribution of the } k(k + 1) distinct elements of the sample 
covariance matrix S, given by (13). It is more convenient to study S; = (n — 1)S. 
By (17) we have 


(19) CS,C’ = Ai.n-i. 


The linear transformation (19) relates the sample covariance matrix to Aj.n-1, 
defined by (1). The decomposition (5) or (6) of Az.,—: is essentially the first step 





RANDOM ORTHOGONAL TRANSFORMATIONS 421 


in the Bartlett decomposition, giving rise to a x, variable zi and k — 1 9(0, 1) 
variables z, --- 2. If the decomposition is continued, then Aj,,-: , and there- 
fore by (19) also S, , is related in a simple way to k x’ variables and $ k(k — 1) 
9(0, 1) variables, all independent, which provides the complete Bartlett decom- 
position. 

The density of the 4 k(k + 1) distinct elements of Aj,,; is given by (12), 
after replacing n by n — 1. The Jacobian of the transformation (19) is* 


(20) ee! 


which we find equals | = | ?. using (18). Furthermore, by (18) and (19) we 
have | Aj... | = | &|~*|S,| and tr Aj... = tr = S,. Substitution of these 
expressions into (12) gives the Wishart distribution 


P(Si) = Crna| =|” |S, | “*” exp [—4 tr E'S), 
in which it has to be remembered that S; = (n — 1)S, and C;,,_: is given by (11). 


&+1 
=|c|", 


4. Acknowledgments. The writer wishes to thank Dr. Henry Scheffé for 
bringing the problem to his attention and for helpful suggestions, and Dr. 
Charles M. Stein for providing some valuable references. 


APPENDICES 


Appendix 1. Let Q™™ = Q‘”(Z) be an n X n orthogonal matrix 
(1 S m S n — 1) depending on an n-component column vector Z. If Z has 
components 2; --- Z,, then the elements w{7” of Q‘” will be defined as follows: 
wo = b;, for i,j = 1-->m — 1, m + 2-:-n; oR = ran = 
Zasi(2e + Zu); wi. = -~- of, = Zm(Zm + zou): and all other off- 
diagonal elements vanish. If both z,, and z,4; are equal to 0, then we define 
a’” to be I, . The effect of Q°”’(Z) applied to Z is that all components of Z 
remain unchanged, except for the m-th and (m + 1)-st components, of which 
orthogonal linear combinations are taken such as to make the m-th component 
equal to 0 and the (m + 1)-st equal to (zm + 2341). If we puta = Q*””... 
a”, where Q = Q"(Z), 2° = Q”(Q%Z), --- , then the first n — 1 com- 
ponents of QZ are zero and the n-th is (}>7-1 zi). Q is clearly measurable. 

Appendiz 2. Put Vo) = C(Um~, — wo), r = 1---m, where C is any non- 
singular k X k matrix. Substitution in (13) and (14) yields the result that 7° 
retains the expression given by (14), with (ui — ui) everywhere replaced by 
»,, (T* is invariant under nonsingular linear transformations of U). If C is chosen 
such as to transform = to I, (see (18)), then the v,, are independent and 91(0, 1). 
It is possible now to write Tj = [1/(n — 1)]T* as follows: 


(21) % = nV (Mz, (I, = 4,)(Min)’}'V, 


in which A, is an n X n matrix of which every element equals 1/n. Let Q be 
ann X n orthogonal matrix whose last column has all elements equal to 1/(n)*” 


. 


5 See [1], [5]. For completeness, a proof is also indicated in Appendix 3. 





422 ROBERT A. WIJSMAN 


We find that Q’(I, — A,)Q = J,, where J, is obtained from I, by replacing 
the nn-th element by 0. If the variables x;, are related to the v;- by Min = MinQ, 
then Xi.) = (n)'”V, and the matrix in braces in (21) is 


M;,.29’(1,, oe A,)QQ'(M;j,.)’ = Mi..J.(Mi,)’ — Mi n—1(Mi,n—1)’ _ Ain; 


which proves (15). 

Appendix 3. Since any nonsingular matrix C can be written as the product of 
elementary matrices, equation (20) need only be verified for the latter ones. 
Multiplication of a square matrix A by an elementary matrix results in either 
of the following elementary operations on A: 

(i) interchange of two rows (columns), 
(ii) multiplication of a row (column) by a constant c ~ 0, 

(iii) subtraction of a row (column) from another row (column). The absolute 
values of the determinants of the corresponding elementary matrices are 1, 
| c |, 1, respectively. The corresponding Jacobians can easily be checked to be 1, 
| c |***, 1, respectively. This completes the proof. 


REFERENCES 

{1} A. C. Arrxen, ‘On the Wishart distribution in statistics,’’ Biometrika, Vol. 36 (1949), 
pp. 59-62. 

[2] T. W. AnpErRson, “Multivariate statistical analysis,’’ mimeographed lecture notes. 

[3] M. S. Bartierr, ‘‘On the theory of statistical regression, Proc. Roy. Soc. Edinburgh, 
Vol. 53 (1933), pp. 260-283. 

[4] H. Cramér, “Mathematical Methods of Statistics,’’ Princeton University Press, 1946. 

[5] W. L. Deemer anp I. OLxin, ‘“‘The Jacobians of certain matrix transformations use- 
ful in multivariate analysis,’ based on lectures by P. L. Hsu, Biometrika, Vol. 
38 (1951), pp. 345-367. 

[6] G. E.rvine, “‘A simple method of deducing certain distributions connected with 
multivariate sampling,’ Skand. Aktuarietids., Vol. 30 (1947), pp. 56-74. 

[7] D. Foa, ‘“‘The geometrical method in the theory of sampling,’’ Biometrika, Vol. 35 
(1948), pp. 46-54. 

[8] H. Horeturne, “The generalization of Student’s ratio,’’ Ann. Math. Stat., Vol. 2 
(1931), pp. 360-378. 

[9] P. L. Hsu, ‘‘Notes on Hotelling’s generalized T,’’ Ann. Math. Stat., Vol. 9 (1938), 
pp. 231-243. 

{10} P. L. Hsu, ‘‘A new proof of the joint product moment distribution,’’ Proc. Cambridge 
Philos. Soc., Vol. 25 (1939), pp. 336-338. 

{11} A. E. Inewam, “‘An integral which occurs in statistics,’ Proc. Cambridge Philos. Soc., 
Vol. 29 (1933), pp. 271-276. 

[12] A. T. James, ‘‘Normal multivariate analysis and the orthogonal group,’’ Ann. Math. 
Stat., Vol. 25 (1954), pp. 40-75. 

[13] S. KutuBack, ‘“‘An Application of characteristic functions to the distribution problem 
of statistics,’’ Ann. Math. Stat., Vol. 5 (1934), pp. 264-307. 

[14] P. C. Manatanosis, R. C. Bos, ann 8. N. Roy, ‘‘Normalization of statistical variates 
and the use of rectangular coordinates in the theory of sampling distributions,’’ 
Sankhyd, Vol. 3 (1937), pp. 1-40. 

[15] J. G. Mauupon, ‘Pivotal quantities for Wishart’s and related distributions, and a 
paradox in fiducial theory,’ J. Roy. Stat. Soc., Ser. B., Vol. 17 (1955), pp. 79-85 





RANDOM ORTHOGONAL TRANSFORMATIONS 423 


16] J. Ocawa, “On the sampling distributions of classical statistics in multivariate analy 
sis,’’ Osaka Math. J., Vol. 5 (1953), pp. 13-52. 

{17} I. O_krn, ‘On distribution problems in multivariate analysis,’’ Inst. of Stat. Mimeo 
Series, Vol. 43 (1951), pp. 1-126. 

[18] C. R. Rao, Advanced Statistical Methods in Biometric Research,’’ John Wiley and Sons, 
1952. 

[19] G. Rascn, ‘‘A funetional equation for Wishart’s distribution,’’ Ann. Math. Siat., 
Vol. 29 (1948), pp. 262-266. 

[20] 8S. S. Wiiks, ‘‘Certain generalizations in the analysis of variance,’’ Biometrika, Vol. 
24 (1932), pp. 471-494. 

[21] J. WisHart, ‘‘The generalized product moment distribution in sampling from a nor- 
mal multivariate population,’”’ Biometrika Vol. 20A (1928), pp. 32-52. 

[22] J. WisHart, ‘Proofs of the distribution law of the second order moment statistics, 
Biometrika, Vol. 35 (1948), pp. 55-57. 

[23] J. WisHart anv M. 8. Bartuett, “The generalized product moment distribution in a 
normal system,’’ Proc. Cambridge Philos. Soc., Vol. 29 (1933), pp. 260-270. 





ON THE DISTRIBUTION OF RANKS AND OF CERTAIN 
RANK ORDER STATISTICS! 


By Meyer Dwass 


Northwestern University and Stanford University 


1. Introduction. Suppose X,,---, X», and Xni:,-°-, Xw are two inde- 
pendent samples from two possibly different populations, and R,,--- , R, are 
the ranks of the first m observations in the combined sample and Ry4:, --- , Rw 
the ranks of the remaining observations. In the first part of the paper, various 
moment generating functions connected with these ranks are derived. Of par- 
ticular interest may be the moment generating function of the Wilcoxon statistic. 
The asymptotic distribution of a finite number of ranks is derived as N — ~. 
The remainder of the paper studies certain aspects of the distribution theory 
of rank order statistics of the form }-%; fy(R;/N). The Wilcoxon statistic and 
the Hoeffding c,-statistic are special cases of such a statistic. Many previous 
studies have been devoted to showing its asymptotic normality. The main 
purpose of the last half of this paper is to show that for certain combinations 
of sample sizes m, n, and parent populations, the limiting distribution is non- 
normal asm — ©,n— «, and m/N — 0. 


2. Generating functions for ranks. Throughout this paper we suppose that 
Xi, +++, Xm, Xmai,°***, Xmin are N = m + n independent random vari- 
ables, the first m identically distributed, each with c.d.f. F; and the last n identi- 
cally distributed, each with c.d.f. F,. We suppose these c.d.f.’s are continuous. 
By the random variable R; , the rank of X;, we mean the number of X,’s less 
than or equal to X;. The main object of this section is to write an expression for 
a generating function for ranks, and the following notation is intended to be 
useful toward that end. Let w = —~, uy; = ©, and 


Ug < Uy << *e* < Up XK Ura. 
Then we denote 
Ginjan = Gi jaslu;, Usas) = Puja) — Filu;) (¢ = 1,2;7 =0,---,r). 


Let i; , %2, --* , 7 be a permutation of the r = p + q(p S m,q S n) integers 
1,2,---,p,m+1,---,m-+ q, and let e;, ,--- , e:, be defined by 
1 if zis one of 1, ---, p, 
* ane 
\2 if 4, is one of m+ 1,---,m+q, 


with similar definitions for e;, , --- , e:;,. If d, < d; < --- < d,isa set of posi- 
tive integers, they uniquely determine a set of non-negative integers w , 


Received June 18, 1926; revised October 22, 1956. 
' Work performed under Office of Naval Research Contract Nonr-225(21) 


424 





RANK ORDER STATISTICS 


- , w, such that 


d, = wu, + l, 


(1) ds = WwW + We + B, 


d, = wy + We t+ °° + wy, +7. 


Conversely, (1) determines for non-negative integers w; , --- , w, a corresponding 
set d; . +49. Gis 


We first want to evaluate 
P{R, =d,---,R, =d; d<---<dsN} 
for positive integers d;. By an elementary computation we can write this prob- 


ability as 


(m— p)! (n — q)! 
Yee 


rl-+> Sar lé!--- tr4i! 
ys var t yt 7 , 
| Gil +++ Giah- Gil --- Gia dF.,,(m) +++ dF.,,(w). 
"*<Ue 


where, to facilitate printing, e;, is written as e, when it occurs as a subscript 
etc., and where >> is summation over all non-negative integers s; , t; such that 


8 + +++ + 841 = mM — PD, 
Stes>-+6a ©2828 —@ 
Sth =w, 


8S + te = We, 


& + t = w,, 


the w, being determined by the d; as in (1). Next we recognize that (2) equals 


the coefficients of Tj; --- Ty. in 
| (Ti, Gu + 77 T Gir + hous 
1 enakail 


(7, Ga tooo t+ T1, Ge,, + Guia ~ dF, (a3) en dF, 50h.) 
Since we can write 
d 7 ds 
T re . 
yw We Wr . ‘ 
Ti; Ti2 eee 7 <¢ a (= — 
r% T i,/ 


this suggests that we make the relabelling 





MEYER DWASS 


tr * 


Substituting from (5) into (3) and denoting the resulting function of @, , - - - 
by ¢(i:, -*- , a), we have 


> P(Ri; = a,-°-,R 


where > is summation over all integers d; such that 1 < d, < d, < --- < 
d, = N. We can now state the following. 
THEOREM 1. The generating function of Ri, --- , Rp, Rmsi, +++ , Ruse equals 


(6) >> P(Ri = dh, -+: , Raq = dy) --* OF = Do 0,65, «+ O5,0(1, --* , tr), 


where > is over all possible integers d; , --- , d, between 1 and N (no two equals 
to each other) and 2 is over all permutations i, , --+ , t, of the integers 1, --+ , p, 
m+1,---,m+4q. 

REMARK. Equality among any two d,’s is equivalent to tied ranks which is 
excluded with probability one by the assumption of continuity of F; , F.. 


d dy 2 r ys 
i, = d,)6;: cee g;° = 6;, 63, eee 65,¢(% . 


3. Several special cases. In this section we look at three special cases. They 
will be referred to again later. 
A. The generating function for a single rank. To find the generating function 
for a single rank, say R, to be specific, set 
p=1, q=0, r=1, G 
in (6). We then obtain that 


be” a | (6G11 + Gi2)”" ; (0G2, 7 Gee)" dF (u) 


6] ((@ — 1) Fi(u) + 1)"* (6 — 1) Pau) + 1)" dP i(w). 


B. The generating function for R, , Ro, --: , Rm . For this case we set 
p r mM, q 0, 
in (6) and obtain 
ke? et... gm 
= >) 0:,6%, «++ 6%, | (0;, «+ 0:,.(0;, — 1) Fe(m) 
Ur<res Sum 


+ eee +H (0;.,, _ 1) F2(uUm) ae 1)” dF (uy) eee dF (um), 


where 7. is over all permutations 2; , 72, --- , 7 of 1,2, --+ ,m. 





RANK ORDER STATISTICS 427 


C. The Wilcoxon statistic. This statistic is R,; + --- + R,,. In case B above, 
set 0, = 0. = --- = 6, = 6 and we find that 


Ett + +8m = migrant» 2 / (e" (0 _ 1) F(u) 


ur<cr*<tm 


+ +++ + @ — 1) Palm) + 1)” dP i(us) «++ dF i(Um). 


4. Limiting distributions involving a fixed number of ranks. 
A. A single rank. From (7) we have that 


Ee*”™ = oe!" [ ((e* — 1) Fy(u) +1)" ((e*” — 1) F.(u) + 1)" dFy(u). 


Suppose m > ©,n— ~, m/N — p. Since 
(e"* — 1)F(u) + 1? 
= |Fi(u)? + (1 — Py(u))* + 201 — Fi(u))F,(u) 008 (0/N)| 


and since, as N — ~, 
((e*™ — 1)F,(u) + 1)" — exp [i6F (u)] 


we can apply the Lebesgue bounded convergence theorem to conclude that 
(8) E exp (i0R;/N) > [ exp [16(pF,(u) + (1 — p) F2(u))] dFi(u), 


as N — «. Hence R,/N is asymptotically distributed as a random variable 


pF;(X) + (i v— p) F(X), 


where X has c.d.f. F(X). 

Remarks. (a) Notice that the extremes, p = 0 and p = 1 are included in this 
result. 

(b) If we do the above computation for R;/N, its limiting characteristic 
function is given by the right side of (8) for 7 = 1, --- , m, and by the right side 
of (8) with dF; replaced by dF, forj = m+ 1,---,m+n. 

(c) A similar analysis shows that R;,/N,---, R;/N are asymptotically 
independent as N — if j; < je < +++: < je are fixed indices which do not 
depend on N. 

B. R,, --+ , Rm. We hold m fixed and let n — «. Then by the above remarks, 


E exp [i(@:Ri/N + --- + @nRm/N)| > II r exp (10; F2(u)] 


asn — o, Thus, R,/N,--- , R,,/N are asymptotically independent and each 
is distributed as a random variable F,(X), where X has c.d.f. F; . Let us denote 
the limiting c.d.f. of Ri/N by S(t). That is, there is a c.d.f. S, such that at any 





428 MEYER DWASS 


continuity point ¢ of S, 
(9) P(R,/N < t) > S(d, 


as N — «, for fixed m. Notice that in case F, has an inverse F2', then 


S(t) = F\(Fz'(t)). 


5. Limiting distributions of S = )0%: fy(Ri/N). 
In this section we study the asymptotic distribution of rank order statistics 
of the form 


(10) Sy = > fx(R/N), 
i=1 
where fy(z/N) is a real number defined for 7 = 1,---, N. We give below a 
short discussion on why Sy is of interest and on some of the known results 
regarding its asymptotic distribution. 
For convenience, suppose that 


In(QA/N) S fu(2/N) S --- S fu(N/N). 


Let Hy(t), (0 < t < ~) be thec.d.f. of the N numbers fy(t/N). That is, Hy(t) = 
proportion of fy(7/N) less than t. Perhaps the most notable example of a statistic 
of the form (10) is the Wilcoxon statistic, in which case fy(t) = t, (0 S ¢ S 1), 
for all N. In [3] it was shown that in case F; , F, depend on a single parameter 


6 and F; = F, when @ = 0, then often a test of Ho:6 = O against H:6 > 0 
based on (10) for suitably chosen fy is a locally most powerful rank order test 
(local in the sense that @ is close to zero). Studies relevant to the asymptotic 
normality’ of (10) can be found in [1], [2], [3], [4], [6], [8]. In particular, it may 
be worth while to mention some specific conditions which insure the asymptotic 
normality of S, . In each case we assume that m/N — pas N — ~ andO0 <p < 1. 

(1) F,; = F2.fx(i/N) = EZ‘; for some positive integer k, where Zy,; S --- < 
Zww are the ordered values of N independent identically distributed random 
variables [1]. 

(2) F,; = F,.. The assumption preceding Theorem 2 below holds and the 
c.d.f. H has its first two moments [3]. 

(3) F; + Fe or Fy = F.. fr(t) f(t) a polynomial in ¢, which does not 
depend on N {3}. 

We shall now construct the examples referred to in the introduction. The 
main tools are Theorems 2 and 3 which follow. In addition to the basic assump- 
tions made at the beginning of Section 2, we assume also through the remainder 
of the paper that there is a c.d.f. H(t) such that at every continuity point of 

2 Whenever we refer to the asymptotic distribution we mean a limiting distribution of 


(Sy — an)/by , as m— ©,n — & for a proper choice of fay}, {bw}. It may be that m 
depends on n 





RANK ORDER STATISTICS 
H(t), 


Hy(t) — Hb, asN — ~. 


THEoreM 2. Let t;, --- , tm be such that they are continuity points of H(t) and 
such that H(t), --- , H(tm) are continuity points of S(u). Then 


Pifv(Ri/N) < t,-++, fw(Rm/N) < tm} — S(H(h)) --- SCH (tm)) 
as N — ~, provided m is fixed. 
Proor. We can write 
Pifw(Ri/N) < th, +++, Sw(Rm/N) < tn} 
= P{R,/N & Hy(h), -+- , Rn/N S Hn(tm)}. 


The result follows from (9) and the remarks preceding it. 
Coro.uary. Let g(u) be the characteristic function of a random variable with 


c.d.f. R(t) = S(H(t)). Then 
E exp i(ufx(Ri/N) + --- + Unfv(Rm/N)) — o(u) --+ e(Um) 
as N — ~, m fixed. 
Lemma 1. Let Xy, Xw,--: ; Xn, Xn,---, be two infinite sequences 
random variables, and let the random variable 
tn = tm.n( Xn ee a Xim ; Xn, ee Xn) 


be a function of the m + n random variables in parentheses. Let om,.(u) 
E exp (iut,,,,). Suppose 

(a) There is a characteristic function ¢ such that for every positive integer 
and every real u, 


Omn(u) — [e(u)]” 


asn— © and m is fixed. 


(b) There are norming constants a,,, b, and a characteristic function V, such 
that for every real u 


(11) exp (—antt/bm)[o(U/bm))” — ¥(u) 


as m— ~., Then there is a sequence of pairs of positive integers (1, n(1)), (2, n(2)), 
-++ , (m, n(m)), --- such that 


(12) Xp (—14mU/bm)[Gm.n(U/bm)] > ¥(u) 


asm— “,n-— >» &, provided n = n(m). 

The proof is elementary and we omit it. We point out that (12) says that the 
distribution of (tn. — @m)/bm converges to that distribution whose character- 
istic function is V. 

Lemma 2. Let R(t) = S(H(t)) be as defined above. Suppose 0 < F(t) < 1 


if and only if 0 < F,(t) < 1. Suppose also that t = F~}(u), the inverse of F(t), 





430 MEYER DWASS 


is defined for all 0 < u < 1. Then 

(a) if PF; = F:, R(t) = Hb, 

(b) if F, = H, R(t) = F,(t). ’ 

Proor. The proof follows from the fact that S(u) = F,(F72(u)). 

THEOREM 3. Suppose that if Y:, Y2, --- is an infinite sequence of independent 
identically distributed random variables, each with c.d.f. R(t) = S(H(t)), then 
then there are norming constants {am}, {bm}, such that the c.d.f. of (>°T Yi — am)/bm 
converges to a c.d.f. L(t), as m —> «. Then there is a sequence (1, n(1)),---, 
(m, n(m)), --- , such that the c.d.f. of 


(13) 2 Sx(Ri/N) — Am 
ae 


converges to L(t) asm — ©,n-— > ©, provided that n = n(m). 

Proor. Let tmn = vo fw(R,/N). This statistic satisfies Condition (a) of 
Lemma 1 by the corollary to Theorem 2. (Let uw = --- = u» = u in that corol- 
lary.) Condition (b) holds by assumption and this completes the proof. 

Remarks. (a) An unsatisfactory feature of this result is that it tells nothing 
about the relative orders of m and n. It is clear that we can find sequences 
{m,}, {n;}, such that if m = m;,n = n;, then the asymptotic distribution of 
the proposition holds and 


(14) lim; m;/(m; + n,;) = 0. 


Though our methods here are not sensitive enough to yield this information, 
the sense of the derivation is such to make reasonable the conjecture that this 
asymptotic distribution holds for all sample size sequences for which (14) holds. 

(b) By Lemma 2, if F; = H then R(t) = F,(t). By the proper choice of F; we 
can determine the limiting distribution L to be any stable distribution. For 
example, suppose (10) is the Hoeffding c;-statistic [7]. That is, fy(i/N) = EZyi, 
where Zy; S --- S Zwy are the ordered values of N independent N(0, 1) random 
variables. According to [5], H is the unit normal c.d.f. Now suppose that the 
alternative to the usual null hypothesis that F; =F; is that F, is the unit normal 
c.d.f. and that F; is the Cauchy c.d.f., centered at 6. Then there are sequences 
{m, n(m)} such that [ >°%1 fw(Ri/N) — mé] /m has asymptotically the Cauchy 
distribution centered at zero. This is so because Lemma 2 (Case (a)) insures 
that R is the Cauchy c.d.f. and because an average of independent and identi- 
cally distributed Cauchy variables is distributed like any one of its components. 

(c) In case H(t) concentrates all its mass on a bounded interval, then so does 
R(t) and excluding the one point limiting distribution, the limiting distribution 
of this theorem must be normal. This will happen if fy(#)(0 S ¢ S 1) isa poly- 
nomial in ¢ which does not depend on N. This is not surprising since for this case 
[3] shows that if limy.. m/N = p exists, then Sy is asymptotically normal for 
all 0 < p < 1. As a matter of fact, these results would seem to imply that one 
should be able to include the extreme cases p = 0 and p = 1. Similarly, if F; = F: 





RANK ORDER STATISTICS 431 


and H has its first two moments, then Sy is asymptotically normal. This is also 
not surprising since for this case [2] shows asymptotic normality for 0 < p < 1. 

(d) We can construct further examples of non-normal limiting distributions 
by supposing F; = F, and by choosing H properly, since by Lemma 2(b), R = H. 
This is presumably of lesser interest than the construction given in Remark (b) 
above. 


REFERENCES 
1} M. Dwass, “On the asymptotic normality of certain rank order statistics,’’ Ann. 
Math. Stat., Vol. 24 (1953), pp. 303-306. 
[2] M. Dwass, ‘‘On the asymptotic normality of some statistics used in non-parametric 
tests,’’ Ann. Math. Stat., Vol. 26 (1955), pp. 334-339. 
] M. Dwass, ‘‘The large sample power of rank order tests in the two-sample problem,”’ 
Ann. Math. Stat., Vol. 27 (1956), pp. 352-373. 
. Hoerrp1nG, ‘‘A combinatorial central limit theorem,’”’ Ann. Math. Stat., Vol. 22 
(1951), pp. 558-66. 
. HorerrpineG, “On the distribution of the expected values of the order statistics,”’ 
Ann. Math. Stat., Vol. 24 (1953), pp. 93-100. 
[6] G. E. Norruer, “On a theorem by Wald and Wolfowitz,”? Ann. Math. Stat., Vol. 20 
(1949), pp. 455-58. 
[7] M. E. Terry, ‘‘Some rank order tests which are most powerful against specific para 
metric alternatives,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 346-66. 
[8] Wap, A., anp J. WoLFrow17z, ‘“‘Statistical tests based on permutations of the observa- 
tions,’”? Ann. Math. Stat. Vol. 15 (1944), pp. 358-72. 


W 


[5] W 





ESTIMATING FUTURE FROM PAST IN LIFE TESTING 


By Joun E. WALSH 


Military Operations Research Division, Lockheed Aircraft Corp., Burbank, Calif. 


1. Summary. Let 6, represent the unique 100p per cent point of a continuous 
statistical population, while x, is the rth largest value of a sample of size n from 
this population (r = 1, --- , n). This paper considers estimation of @, on the 
basis of 2,1) , --* , Zecm) , Where the r(z) differ by O(n + 1) and do not neces- 
sarily have values near (n + 1)p. Also considered is estimation of zz on the 
basis of za), --* , 2m) , Where the r(i) differ by O(4/n + 1) and do not neces- 
sarily have values near R. The results are of a nonparametric nature and based 
on expected value considerations. These estimation procedures may be useful 
for life-testing situations where time to failure is the variable and some of the 
items tested have not yet failed when observation is discontinued. Then 6, and 
rp can be estimated for p and FR values which extend a moderate way into the 
region where sample data is not available. Estimation of the zz value which 
would be obtained by continuing to observe the experiment represents a predic- 
tion of the future from the past. The results of this paper may be of value in the 
actuarial, population statistics, operations research, and other fields. 


2. Introduction. Let us consider a sampling situation where n items are simul- 
taneously life tested to determine their times to failure. Then the time to failure 
for the first item which fails is the smallest value for this sample of size n. The 
value for the second item to fail is the next to smallest sample value; etc. Thus 
life-testing situations have the property that the r smallest order statistics of a 
sample are determined in advance of the remaining values of the sample. More- 
over, the first r items to fail furnish the r smallest values of the sample of size n, 
even if some or all of the remaining sample values are never determined. Jacobson 
called attention to these valuable properties of life-testing situations in [1]. A de- 
scriptive outline of the life-testing field is given in [2]. 

The property that the r smallest order statistics of a lifetesting sample can be 
obtained without the necessity of determining the remaining sample values can 
be exploited in many ways. The basis for this exploitation is that substantial 
time and/or cost can often be saved by stopping a life-testing experiment at 
some convenient time before all the items have failed. The situation of this type 
considered here is the estimation of 6, on the basis of 2, , --- , z, when (n + 1) 
p > r—that is, estimation of population percentage points in the region not 
covered by the available data. 

The life-testing property that the r smallest order statistics are determined in 
advance of the remaining sample values furnishes an opportunity for estimating 
the future from the past. Suppose that r items have failed up to the present time 
and it is desired to predict the future time at which the Rth item of this set will 


Received June 20, 1956; revised September 14, 1956. 
432 





LIFE TESTING 433 


fail (R > r). That is, 2, --- , 2, are known and estimation of zx, in an expected 
value sense is desired. This paper derives such an estimate for the case where r 
and R do not differ too much. 

The purpose of this paper is to derive nonparametric expected value estimates 
which are approximately valid for nearly all continuous statistical populations 
of practical interest. These estimates are not intended to be competitive with 
those which can be obtained on the basis of additional information about the 
population sampled. Instead, the nonparametric estimates presented are for use 
when more specialized estimation methods are not warranted. 

Whether 6, or zz is estimated, the arithmetical problem consists in determining 
the values of c;, --- , ¢» for a linear function of the form 


m 
Z Ci Tei) , 
i=1 


where m, r(1), --- , r(m) are given integers. The procedure for obtaining the 
values of the c; consists in solving m specified linear equations in m unknowns. 
Although the emphasis of the paper is on life-testing situations, the results de- 
rived are valid for more general types of situations than those where only 7; , 

- , 2, are available and r < (n + 1)p, R. For estimation of 6, , knowledge of 
the values of order statistics x,q) , --- , Zr(m) Such that r(i) = r(j) + O(/n + 1) 
and none of the r(z) differ too much from (n + 1)p is sufficient. In estimating 
Zr, it is sufficient that z,q), --- , rm are available with r(i) = r(j) + 
O(-/n + 1) and none of the r(7) differing too much from R. Thus the results of 
this paper can also be used to estimate the past from the present for life-testing 
situations where the data for the past was lost or not recorded. 

Life-testing situations where population properties are of greater interest than 
sample properties usually involve inanimate objects such as automobile tires, 
light bulbs, etc. Often a considerable savings in time and/or expense can be 
obtained by deliberately stopping a life test of this type when 80 to 90 per cent 
rather than all of the items have failed. Through use of the method given in this 
paper, many of the upper population percentage points of interest can be esti- 
mated even though the upper 10 per cent to 20 per cent of the data is truncated. 

The future mortality occurrences among the now-surviving members of a given 
set of items can be of interest for some types of life-testing situations. The future 
mortality among the survivors of a specified group of persons which have already 
been observed for some time represents a situation of this nature. Estimates of 
the future mortality among the survivors of such a group of persons can be valu- 
able in actuarial science, population statistics, and other fields. This paper pre 
sents a rather widely applicable procedure for estimating the first time at which 
a specified number of additional individuals will have died on the basis of the 
times to death for the individuals which have already died. 

An investigation is made of the variances for the derived estimates. Every 
estimate considered has an estimate of the form 


p(l — p:)/n{f(6,)) + O(n), 





434 JOHN E. WALSH 


where ¢ is a specified number which differs from the r(i) by O(./n + 1), m = 
t/(n + 1), and f(x) is the probability density function (pdf) of the statistical 
population sampled. Thus all the estimates presented are consistent, having 
standard deviations which are O(1/+/n). 

If all the sample values were available, the corresponding sample percentage 
point might appear to be the most suitable nonparametric expected value esti- 
mate for @, . In many cases, however, an estimate for 6, of the type given in this 
paper may have a higher efficiency (i.e., smaller variance) than the corresponding 
sample percentage point. The variance for the sample percentage point corre- 
sponding to @, is 


p(l — p)/n{f(@,)} + O(n’). 
If n is large and 


pil — pi)/f(Op,)° < p(l — p)/f(O,)’, 


the sample percentage point corresponding to @, usually has a lower efficiency 
than an estimate of the type presented here. This inequality is frequently satis- 
fied for unimodal populations where 6,, is more toward the central part of the 
probability distribution than @, . 

In deriving the results, f(z) is assumed to exist, be positive, and of an analytical 
nature for all x of interest. Such strong restrictions on f(x) are not necessary for 
the validity of the results presented. However, little generality is gained for 
practical cases by using weaker restrictions on f(z). There are limitations on the 
accuracy to which measurements on continuous observations can be made for all 
applied situations. This data-accuracy limitation indicates that the conditions 
imposed on f(x) should be acceptable for virtually all practical situations of a 
continuous type where 6, is unique for all p. 

Section 3 contains a statement of the estimates for 6, and xz along with some 
restrictions on their use. A numerical example of the application of each type of 
estimate is given in Section 4. Assuming a standard normal population, the ap- 
proximate properties stated for these estimates are compared with their exact 
properties. Section 5 contains the derivations and motivation for the material 
given in Section 3. 


3. Statement of estimates. Let us consider an explicit statement of the method 
for obtaining estimates of @, and x, . The additional notation used is 


> of, 


gat 
™ j=l 


max r(i), 
lsigm 


d(i) = t — r(i) = quantity which is O(+/n + 1), 
d(i) ¥ d(k) ifi # k 
a= t/(n + 1), Qa=-1-—p, 





LIFE TESTING 


Pe = R/(n + 1), qr 1 — Dr, 
A; = [K(p — pd)""/G — 1), 
-_ [3(pe - pd" Pr qe(j a IG ~ = 


c;=S —— cmenhaenn i -< 3-2 


“a= (; — 1)(—)5 Ale 
cha), 3 = 9 _ Ga a) 


(Gj — 1)! (n+ 1)G-— 1)! 
(j — 1) — 2)(-8)* ale) m 
2(n + Dintd)G—- Di + 1) pege + (De — qe) dt) + d(t) 


; a ahi ‘ 
a in + 1) pre + 4 als) 


Cj, i) = Cid(i), 3(p — p.)), 
C'G, t) = Cid), 32 — Pod), 
where j = 1, ---,m. 


For specified n, m, r(1), --- , r(m), p, R, the estimates considered and their 
principal expected value properties are given by 


E bP azn | = 6, + O(n**) + OL| Hp — pd) |", 


E bP bt | = E(xpz) + O(n”) + O[| (pe — pe) |"). 
=I 

If m = 4, O(n~*”) is replaced by O(n™) in these expressions. The sets of linear 
equations used to determine the values of the a; and the 5; are 


> a:C(j,i) = A;, > d:C’G,i) = B;, 
i=1 i=l 
The values for the a; and the b; can be conveniently expressed in the form of 
determinants. This form is especially useful for small values of m. Explicitly, 


‘ca determinant of the C(u, v), with C(j, 7) replaced by A; forj = 1,---,m 
ws determinant of the C(u, v) ' 


et determinant of the C’(u,v) with C’(j,7) replaced by B; for j = 1, ---,m 


determinant of the C’(u, v) 


If the determinant of the C(u, v) is zero or near zero, a change in the values for 
the r(7) may be required to assure that none of the a; are of too large a magnitude. 
Usually a change of one value is enough to eliminate this difficulty. The same 
will be true if the determinant of the C’(u, v) is zero or near zero. 

Let us consider determination of a value for m which seems large enough to 
assure that the unstated higher order expected value terms can be neglected. 
Accuracy to terms of order n~*” implies that m = 3. The value used is also re- 





436 JOHN E. WALSH 


quired to satisfy the condition 


( (3/2 4 
hag pee Oe if @,, estimated, 
log | 3(p — pr) | 
m = * 


~ jlog [max (n~*?, 10) sc 
| 208 ( te yy if tp estimated. 


log | 3(pe — Pr) | 
This second condition for determining m is based on the requirement that 


oF 11 3(p — mr) |”, 
max(n"'*, 10°) = { ; - : 
\| 3(pe — pe) |”, if rz estimated. 


if 6, estimated, 


These two conditions assure that the expected value error of an estimate is 
O{max (n~*?, 10~*)]. The minimum value of m = 3 is acceptable for many of the 
cases encountered. From a computational viewpoint, the method probably 
should not be used if the value of m obtained by this procedure exceeds 10. 

The convergence rates of the expansions used in obtaining estimates for 6, 
depend on 3(p — p,), the d(i)/(m + 1), and the properties of the underlying 
statistical population. In practice, the underlying population properties are usu- 
ally such that convergence is more rapid for p, and p near the center of the dis- 
tribution. On this basis, both | d(z) |/(n + 1)piq: and 4| p — p,|/[min(pq, 
p:qt)| should not be too large. The maximum allowable value for these quan- 
tities is taken to be } for the type of situations considered. This value is not 
overly small but should be satisfactory for a large majority of the practical 
applications. Hence, the method given in this paper for estimating 6, should 
not be used if either 


max |d(i)| > 3p¢(n + 1), 
istsm 


|p — pe| > $ min (pq, piq:). 
On a similar basis, the method for estimating x, should not be used if either 


max | d(i)| > 3p.q(n + 1), 


istsm 


| Pe — De| > $ min (Page, Pr). 


Sometimes the inequality involving the d(i) can be changed from unacceptable 
to acceptable by using a different value for m which allows a decrease in max 
| d(z) |. 

When x, -:: , 2 are given and r > (n + 1)p, R, a recommended selection 
for the values of the r(z) in both types of estimates is 


r(i) = r — (m — 1)K, («= 1,---,m), 





LIFE TESTING 
where 
i ; 5 cu a , 
K = max (1, largest integer contained in — Wn + 1). 
m 


The resulting r(i) differ by O(4/n + 1), are equally spaced, and have desirable 
properties with respect to the expansions used in deriving the estimates. 

Every estimate of the two types considered has approximately the same 
variance. For all estimates derived, the variances are of the form 


pqi/n{f(6p,)) + O(n”). 


Thus, each estimate has a standard deviation which is O(n™*”). The order of 
the standard deviation for an estimate is the reason for neglecting all terms in- 
volving n to orders n~*” and higher in the expected value expressions for these 
estimates. 


4. Numerical Example. To illustrate use of the methods of this paper, let us 
consider the case where n = 20, 2, --: , ts are given, p = 0.84, and R 17. 
The value of m is determined first. This value is the smallest integer which is 
at least 3 and such that 

log [max (n~**, 10~*)] 


ae —- = 1,84, if 0, estimated, 
log | 2(p — px) | 


log [max (n™*?, 10 ‘“ a 
— rT aaa = 1/1, if xp estimated. 
log | 3(pr — px) | 
Thus m 3 for both types of estimates. 
Next let us evaluate r(1), r(2), and r(3). The value of K is given by 


m2 


j ; d Te poo 
K max [1, largest integer contained in — 7/n + 1] = 1. 
m 


Hence, for both types of estimates 

r(1) = 13, r(2) = 14, r(3) 15, 
since r(7) r — (m — i)K. Thust = 14 and 

a(1) = 1, d(2) = 0, d(3) = —1. 
Also the relations 


max | d(i)| S $p:q:(n + 1), 


lsigm 
p — p| S $ min (pq, pig), 
| Pe — Pe| S $min (Page , Pegi) 


are easily verified so that the methods of the paper are applicable for the case 
considered. 





438 JOHN E. WALSH 


By direct substitution, the values of the A; and B; are found to be 
A, = 1.0000, Az = 0.0867, A; = 0.0038, 
B, = 1.0000, B, = 0.0715, B; = 0.0028. 


C(l,1) = 1.0000, C(i,2)= 1.0000, C(1,3) = _ 1.0000, 
C(2,1) = —0.0390 C(2,2) = —0.0867 C(2,3) = —0.1344, 
C(3, 1) 0.0054 (3, 2) 0.0088 (3,3) = 0.0144, 
and 
c’(1,1) = 1.0000, C’(1,2) = 1.000, c’(1, 3) = 1.0000, 
C'(2,1) = —0.0238 C’(2,2) = —0.0715 C’(2, 3) = —0.1192, 
C’(3,1) = 0.0049 C’(3,2) = 0.0076 C’(3,3) = 0.0125. 


Consequently, 


'1.0000 1.0000 _1.0000| 1.0000 1.0000 —_1.0000 
a, = |0.0867 —0.0867 —0.0390| / |—0.1344 —0.0867 —0.0390| 
10.0038 0.0088 —0.0054| | 0.0144 0.0088 ~—0.0054 | 


= 3.38. 
Similarly, ag = —9.41 and a; = 7.03. Also 
1.0000 1.0000 —1.0000| 1.0000 1.0000 1.0000 
by = |0.0715  —0.0715 —0.0238 | /-0.1192 —0.0715 —0.0238| 
10.0028 0.0076 0.0049 | | 0.0125 0.0076 0.0049 | 
= 149. 


In a like fashion, b. = —4.99 and b; = 4.50. 

Using the values determined for the a; and 6; , approximate expected value 
estimates are obtained for 6.4, and zy. These estimates and their properties 
are given by 

E(3.38 tsa = 9.41 Tu a 7.03 X15) = 60.4 + O(n-*) + O[|3(p — P:)|" 

= 60. + O(0.011) + O(0.00065), 
E(1.49 a = 4.99 V4 + 4.50 X15) = E(217) ~~ O(n *) + Of! ‘(De = p:)\") 
= E(x) + 0(0.011) + 0(0.00036). 


3/2 ° 
seems to be much more important that the 


or the contribution of order | 3(pe — p,)|”. 


Here the contribution of order n~ 
contribution of order | 3(p — p,)|” 





LIFE TESTING 439 


To check the expected value accuracy of these estimates, let us consider the 
special case of a normal distribution with zero mean and unit variance. Using 
a table of the standardized normal distribution and the results of [3], 


E(3.38 a3 9.41 wu 7.03 X15) = ().995, Oo. = 0.996, 
E(1.49 233 — 4.99 x4 + 4.50 215) = 0.888, E(2;) 0.921 


Thus the expected values of these two estimates are in rather close agreement 
with the true values for the case of normality. This expected value agreement 
is much closer than is required on the basis of the standard deviations of these 
estimates. For the standardized normal case, 


s.d. of (3.38 213 — 9.41 a4 + 7.03 I) = 1.05 
s.d. of (1.49 ti3 =~ 4.99 Iu t+ 4.50 ty) = 0.72. 


Due to the moderately small value of n, these standard deviation values do not 
agree very closely with the asymptotic value of 


Vig! V nf(Op,) = 0.33. 


The moderately small value of n = 20 was selected for the example in order 
that the results of [3] could be used. 


5. Derivations. Here verification is presented for the expected value and vari- 
ance results stated in Section 3. This verification is based on the material pre- 
sented by David and Johnson in [4]. 

Let s be a number such that 1 S s S n while f(x) has derivatives of all orders 
at all points where it is defined and is non-zero at all points considered. Some 
additional notation is used 

-xX 


F(X) =| f(z)dz, F(X)=p, XO d*X 


” @F* | sz.’ 


u = 1,2,---. Here X$” = X, while X41» = 0p, whether (n + 1)p is an 
integer or not. On the basis of [4], 


Elz,<a} 


, Prd Fr 7 (2) Prd Fr 14 >(3 
= Xe + din 4 2) Xr) + — 2 [3(qec) — Prop) Xrci 


2(n + 2 (n + 2 
+ bpm Gea XH) 


to terms of order n™, where py = r(i)/(n + 1), and Qn) = 1 — Pew . Also 


4 aap Pr Wr (2) ~2 
E(xe) = Xe+ Bn + 2)* r + O(n”), 
on the basis of [4]. 
The first step of the procedure used for developing the estimate of 6, consists 





440 JOHN E. WALSH 


in expanding the E[x,,.)| and @, about a probability value which is halfway 
between p, and p. The value p,; is considered because of the relation 


Elzrcy] = Oy(naay + O(1/Vn), (¢ = i ous . er 


Taylor series expansion about the midway probability value of 3(p + p,) yields 
desirable convergence properties for both the F[x,,.)| and @, . In particular, these 
expansions have about the same rate of convergence. Similarly, the first step 
in developing the estimate for x, consists in expanding the E[z,,)} and E(z,) 
about the midway probability value of }(pxe + p,). Here the probability pp, is 
considered because of the relation E(x») Oring) + O(n"). 

Next let us consider the expansion of E[z,,;)] about the general probability 
value of 3(Q + p,). By use of Taylor series, 


y(u) . (—1)° 1(n + 1) (¢ = )+ d(i) : r(u+e 
X “3 * » oe ji =P ae XT ese, 


=o 60s! 
0,1, --- ), since 


d' ene rr 


———— = ————— (y = l 


ds° (n + 1)" 
Substitution of these relations into the expression given for E[{z,,.)] shows that 


Elz] 


= pm C;\d(i), 3(Q he P1)| Xi} 2300s) —, O(n™) 


=] 


m 


+ zs O{n™"-?* | 3(Q — p,) | 4. 
Here the O(n ~) and/or the O[| 3(Q — p,)|"] terms are the most important of 
those which are not explicitly stated. 
Now the expansions for 6, and E(x,) are considered. The Taylor series ex- 
pansion of 6, about the probability value of }(p + p,) is 


x 


1 een 
6, = a= [4p es Pi) X 4[ (n+1) p+t]- 
u=0 U! 
To obtain the expansion of E(x,) to O(n™), the Taylor series expansion of the 
Xx” about the probability value of 4(p, + p,) is needed. This is given by 


oo 
r(u) 1 \We (ut 
Xr = = [3(pr ~ DI)" Xi (rit), (e = 0,1, --- ). 
v= DV. 
Substitution of these relations into the expression given for E(z,) shows that 
\ 


r(1 


} 
[(pe — px)] > Adcrit 


7/ — |[3(pe — pd)” Prdru(u — 1) 
E( ap) 7 4 ————_—_ { 


wa u! 2(n + 2)u! 


plus terms of order n~ 
To determine the equations which are used to evaluate a;,--- , a», first 





LIFE TESTING 


set Q equal to p. Then the coefficient of X}{2n41 944 in the expansion for 


E(> AxX,(i)) 


is required to be the same as the coefficient of this quantity in the expansion 
of 6,, (u = 0, 1,--- ,m — 1), to terms of the prescribed order. Examination 
of the expansions for E[z,,)] and @, shows that the m linear equations in m un- 
knowns given in Section 3 for evaluating the a; satisfy this requirement when 
m = 3. If m = 4, the terms of order n~*” are also cancelled out. 

To determine the equations used to evaluate b, , --- , bm , set Q equal to pe. 
Then the coefficient of X{{2,» in the expansion for E(>, b.z-») is required to 
be the same as the coefficient of this quantity in the expansion for E(zz), (u = 
0, 1,---, m — 1), to terms of the prescribed order. Examination of the ex- 
pansions for E[z,,.] and E(z,) shows that the equations given in Section 3 for 
evaluating the b; satisfy this requirement when m 2 3. If m 2 4, the terms of 
order n~*” also cancel out. 

Finally let us consider the variance expressions for the type of estimates 
considered. Let the d(z) be numbered so that d(1) > d(2) > --- > d(m). Then, 


using the variance results presented in [4] and the general notation c; to repre- 
sent either the a; or the b,; , 


m 
on > corns 
t=] 


aalle = ‘| peas + a + (De - qt) #9) |x” ee d(t) : xP] 


n+ 1 n+l 


C4C; p. d(i) — % a ay _ _d(i) ”| 
+2y % Ez + n+1 as n+1 = 


; |x” a a xi”| + O(n). 


This follows from the fact that all of the a;, b;, and d(i)/+/n + I are O(1) 


with respect to n. Using the condition }-J'c; = 1, which holds in all cases for 
the estimates derived, it is easily verified thet 


var = Ci le o| — Peds i O(n —#/) 
[2 "1 nlf, JF 
here the relation X{” = 1/f(6,,) is used. 


=I 


REFERENCES 


[1] Paux H. Jacosson, ‘‘The relative power of three statistics,’’ Jour. Amer. Stat. Assoc., 
Vol. 42 (1947), pp. 575-84. 

[2] B. Epstein anp M. Sosetn, “Life testing,’’ Jour. Amer. Stat. Assoc., Vol. 48 (1953), 
pp. 486-502. 

[3] D. Te1cnRoEw, ‘‘Tables of expected values of order statistics and products of order 
statistics for samples of size twenty or less from the normal distribution,” 
Ann. Math. Stat., Vol. 27 (1955), pp. 410-26. 

[4] F. N. Davip anv N. L. Jounson, “Statistical treatment of censored data. Part I—funda- 
mental formulae,’’ Biometrika, Vol. 41 (1954), pp. 228-40. 





PROBABILITY DISTRIBUTIONS OF RANDOM VARIABLES 
ASSOCIATED WITH A STRUCTURE OF THE SAMPLE 
SPACE OF SOCIOMETRIC INVESTIGATIONS ' 


By Leo Katz anp James H. Powe. 
Michigan State University and Western Michigan College 


1. Summary. In this paper, we consider a disjoint decomposition, at three 
levels, of the total sample space for n-person, one-dimensional sociometric in- 
vestigations. This results in a structure particularly suited to determination of 
the probability distributions of a large class of sociometric variables. Systematic 
methods for obtaining these distributions are presented and illustrated by two 
examples; while the first is trivial, the second produces a previously unknown 
result. 

It should be remarked that the methods developed here have application in 
the theory of communication networks and, indeed, in the study of any network 
situations which may be represented by either of the two models employed in 
the paper. 


2. Introduction. The simplest model for the organization of a group of indi- 
viduals is one-dimensional, in the sense that organization for only one activity 
of the group is considered. Connections between ordered pairs of individuals are 
represented by non-reflexive binary relations. Although a binary model appears 
superficially to be too barren to show adequately the richness of variability of 
the response of one individual to another, it is by no means trivial and is pre- 
cisely the model used in most sociometric investigations, where the relations are 
lines of communication, authority, liking, etc. 

In this model, a particular organization of n individuals has two isomorphic 
representations, both of which have been used extensively in the literature for 
descriptive purposes. The older of the two is the linear directed graph on n 
points, P;, P:,--- , P,. A connection from man i to man j is represented by 
a directed line from P; to P;, P; — P; ; the absence of such a connection, by 
no line from P; to P; . The equivalent matrix representation is an n X n matrix, 
C = (c;;), where c;; = 1 if a connection exists from man 7 to man j, and c¢;; = 
0, otherwise. By convention, c;; = 0. Obviously, c;; = 1.(or 0) if and only if a 
directed line exists (or doesn’t) from P; to P;. Hence, the two representations 
are isomorphic. 

To fix the notation, let r; = > ci; be the 7th row total of C and s; = Ds Cj 
be the jth column total. In the graph, r; is the number of lines issuing from the 
point P; , and s; is the number of lines terminating on the point P; . Moreover, 
it; = >.;8; = t, the total number of directed lines. Finally, let the vectors 


Received May 24, 1956; revised October 18, 1956. 
? This work was supported by the Office of Naval Research under contract NR 170-115 
442 





PROBABILITY DISTRIBUTIONS 443 


r and s, with elements r; and s; , respectively, be the two n-part, non-negative, 
ordered partitions of ¢ which represent respectively, the marginal row and 
column totals of C. 

Unless otherwise noted in the sequel, all graphs will be on n points and lin- 
early directed (n-graphs), and all matrices will be n X n hollow matrices of 
1’s and 0’s. (A matrix is hollow if all principal diagonal elements vanish.) 


3. Decomposition of the sample space. The sample space of the possible or- 
ganizations of an n-member group is the space of all possible n-graphs or n K n 
hollow matrices of 1’s and 0’s. In this section, we consider a decomposition of 
the total sample space, 2, following lines which hold promise of utility for cer- 
tain investigations. We define first-order disjoint subspaces, 2, , ¢ = 0,1, ---, 
[n(n — 1), as the collections of n-graphs containing exactly ¢ lines. Obviously, 


n(n—1) 


(1) Q= U , 


t=0 


since the 2; are mutually exclusive and exhaustive. 

Continuing in the same vein, we define second-order subspaces, w(p), p = 
(m1, 2, °** , Ta), a8 the collections of graphs with r; lines emanating from P; , 
i= ++, mn. Since >>; r; = t, we have 


(2a) 2, = Uy), w(p), 


where (p), is a generic symbol for non-negative, integral, ordered, n-part parti- 
tions of ¢ with all r; < 7”. In a completely dual manner, we might define alterna- 
tive second-order subspaces, w(c) in terms of n-graphs with s; lines converging 
on P;. In this case, we would have 


(2b) % = Uw w(c). 


Third-order subspaces are defined by w(p, ¢) = w(p)Mw(c), and are identified 
with spaces of n-graphs with r; lines emanating from, and s; lines converging on, 
P;. Once again, these sets are exclusive and exhaustive in the sense that 
(3a) w(p) =U (o)¢ w(p, o), 
and 


(3b) w(c) = U @), w(p, c). 


We remark that double and triple disjoint decompositions of the larger spaces 
may also be indicated. 

It will be obvious to the reader that there exist isomorphisms among certain 
of these second and third-order subspaces. It will be less obvious, but important 
for computations, that these isomorphisms involve simultaneous permutations 
on the elements of the two vectors p and c. We shall not elaborate on this point 
since it contributes little to the notions with which we are here concerned. 


4. Random variables associated with the structure of the sample space. The 
decomposition described in the previous section imposes a structure on the sample 





444 LEO KATZ AND JAMES H. POWELL 


space. In most sociometric investigations, involving randomness in the existence 
of connections between ordered pairs of individuals, it has been deemed appro- 
priate to assign uniform probability to each of the points in a third-order sub- 
space, at least. In more extreme cases (the vast majority) it is customary to 
assume that every possible sample point is equally likely. Sometimes this has 
been done without even specifying which sample points are possible under the 
conditions of the experiment. 

In the context of the particular experiment, it is usually possible for the 
experimenter to determine that his universe of discourse consists of 2 or one of 
the smaller subsets we have described. If, also, it happens that the random 
variable under discussion assumes the same value over all the points of each of 
certain smaller subspaces, the assumption of uniformity of probability within 
these subspaces will produce the complete probability distribution of the vari- 
able. In this section, we investigate these circumstances. 

We say that a random variable defined over n-graphs or n X n hollow mat- 
rices is associated with the sample space structure of the previous section if the 
value of the variable is constant over all points in every w(p, «) contained in 
the domain of definition of the variable. Every such variable has a probability 
distribution which is completely specified as soon as we are able to count the 
numbers of points in the appropriate subspaces, assuming uniformity of prob- 
ability on each point. In the next section, we shall present methods for carry- 
ing out this enumeration. A variable associated with the structure in the sense 
of the present definition is necessarily one whose value is somehow determined 
by, i.e., is a function of, the r; and s;, 7,7 = 1, 2, --- , n, alone. Indeed, this 
may be taken as an alternative definition. 

To establish that the class of variables associated with the structure has some 
real substance, we examine a few variables which have been the subjects of 
sociometric investigations. Gross expansiveness, or average level of expansive- 
ness, has been defined in terms of ¢ alone in the context of the space Q. Vari- 
ability in expansiveness is defined as a function (usually a sum of squares) of 
the r;, 7 = 1, 2,---, m, sometimes in the context of 2 and sometimes in Q, . 
A number of variables have been defined as functions of the s;,7 = 1, 2,---, 
n, in various contexts ranging down to w(p). Examples are (1) the number of 
isolates, i.e., the number of s; = 0 and (2) the choice status of the most highly 
chosen, i.e., max; s;. Both of these are usually studied in the context of some 
w(p). 


5. Enumeration of the points in various subspaces. In considering the prob- 
lems of enumeration, it will be more convenient to use the matrix representation 
because of its more flexible notation. Thus, the total number of matrices (graphs) 
in Q is the number of ways in which the n(n — 1) elements of C may be specified 
as either zero or one. By elementary considerations, the number of distinct ways 
this can be done is 


(4) i guia 





PROBABILITY DISTRIBUTIONS 445 


The matrices in Q, have ¢ ones distributed over n(n — 1) positions; the num- 
ber of ways this can be accomplished is the number of ways of specifying a 
particular ¢ of the n(m — 1) positions. Therefore, the number of mat- 
rices (graphs) in Q, is given by 


(5) ta 2s Ka ”). 


where (¢). b S a, is the binomial coefficient a!/[b!(a — 6)!]. As is well-known, 


n(n — 1 in 

n= 5 (™ Y) = aro 
t t t 

The enumeration of matrices in w(p) is accomplished by considering, for 

each 7, r; ones distributed over (n — 1) positions. This can be done, independ- 


ently, for each i, in (” = ') ways and thus the total number of 


matrices (graphs) in w(p) is given by 
‘ : a — | 
(6a no) = I] (” ). 
1 Tj J 
By a similar argument, the number of matrices in w(¢) is given by 


(6b) n(o) = II e a 7 


1 


It is easily seen that 


Dwr, 20) = Dower, 20) = me. 


The only difficult counting problem arises when we attempt to compute the 
number of points in w(p, «). This problem was solved by the authors [3] who 
showed that this number is given by 


THEOREM. 


. ) 
n(p,o) = A {1 (1 + sy] (p, o)>. 
) 


t=] 


where the 5; are operators on the pair of vectors defined by 6(71, +--+ , Ti, °** Tn 
5 °°* 5 8,8» Ba) = (ny? % —1,-**, fee,» & — 1,°**, &), 
the symbol A{> Gildas va)} stands for a AoA(pa, Fa) and A(pa, oa) 18 the 
coefficient of the monomial symmetric function of order corresponding to oq in the 
expansion of the unitary (elementary) symmetric function of order corresponding 
lO Pa. 

We note that the coefficients A(p., a.) are given in tables of David and 
Kendall [1] for p. and o, partitions of t up tot = 12. P. V. Sukhatme [5] gave 
an algorithm for computing A(p., ¢2) for any weight and showed that A(p, c) 
is the number of matrices of elements ¢;; = 0 or 1 with fixed row totals r; and 





446 LEO KATZ AND JAMES H. POWELL 


column totals s; but without restrictions on the diagonal elements. We present 
a very much abbreviated alternative to the proof previously given by the au- 
thors in the paper cited above.” 

Proor. G, = []Zj-1 (1 + zw;) generates the A(p., v2) a8 coefficients of 
terms oe 2;** IU- y;'*, and we may write 


Gh _ te) A (Pa ’ oa] Ii x;** I: yi, 


where the sum extends over all a such that 
OsreSn—1, OS 8eSN2—1, DYetia = D5 8je- 


This is most easily seen if each c;; in a matrix C of 0’s and 1’s is represented as 
(xy;)°*'. Then, each term in the formal expansion of G, represents one complete 
configuration of all the c;;, simultaneously. Finally, in each individual term, 
the total exponent of x,(y;) is the sum >>; ¢;; = r:(>-;e:; = 8;), and the coeffi- 
cient A(p_, oa) is the number of distinct configurations of the c;; with the in- 
dicated row and column totals. 

Minor modification of the same reasoning serves to establish that 


H, = II (1 + ays) 
or 
is a generating function for the n(p. , ca). 
Next, we observe that 


H,, = be (1 + nv | .: 


i=1 


In this equation, the coefficient of ([], xi‘ [], y#‘) in the left-hand member is 
n(p, «) and, in the right-hand member, is ee ans tee + 2 (—)**** A(p — a, 
o — a), where the a; range over all non-negative integers. Equating these co- 
efficients gives the expanded form of the statement of the theorem. 

We note that the last sum in the proof above may be written in finite terms, 
since, aS soon as any a; > min (r;, s;), the corresponding A(p — a, ¢ — a) = 
0, by the definition of Sukhatme as a number of certain matrices of 0’s and 
1’s. 


6. Probability distributions of associated random variables. It is now clear 
that we have laid down a program for computing, exactly, the probability dis- 
tributions for any and all random variables associated with this structure of 
the sample space. In particular instances, it may be possible to effect certain 
economies in the computations by exploiting the isomorphisms among subsets 
so as to avoid duplication. 

When the variable in question has constant values on sub-spaces no larger 
than an w(p, o), the computations are always formidable, though never impos- 
sible. In such circumstances, it would seem desirable to develop approximate 


2 This alternative proof follows lines of a suggestion by J. 8. Frame 





PROBABILITY DISTRIBUTIONS 447 


distributions for these variables, treating the exact methods as procedures for 
testing the validity of the approximations over the ranges of group size, etc., 


to be covered. For very small groups, it will usually be feasible to carry out the 
exact computations. 


7. Examples. We shall give two examples of random variables associated with 
the sample space structure. In each, we consider the null case in which each 
graph in the appropriate sample space is equally likely, i.e., a uniform probability 
distribution over the sample space. 

EXAMPLE 1. One measure of gross expansiveness, equal to the total number 
of choices made by group divided by size of group, is given by Loomis and 
Proctor [4] in a contribution to Research Methods in Social Relations. In our 
notation, this index is F = t/n. 

The distribution problem, in the null case, is easily solved. Clearly, the ap- 
propriate sample space is 2 and our random variable, the number of distinct 
n-graphs with ¢ lines, is constant over the first-order subspaces, ©, , in the dis- 
joint and exhaustive decomposition of 2. Thus, our random variable is asso- 
ciated with the sample space structure and according to Section 4 and the 
enumeration formulas of Section 5, the required probabilities are given by 


i - ) 
(7) P(t = k) ye ae Rf 
| 


: ‘Qn(n—D 


EXampPLe 2. An isolate is an individual represented in the graph by a point, 
P, , with no terminating lines and in the matrix by a column of zeros, i.e., 8; !! 
0 in the vector c. The exact probability distribution of the number of isolates 
for the case r; = d(i = 1, 2, --- , n) was obtained from first principles by Katz 
{2], in 1950. 

Using the methods already developed, we can now easily extend this result 
to the general case where the ith individual has r; outgoing connections, the r; 
being not necessarily equal. 

The most common setting for this problem is in the sample space w(p). In 
the null case, we desire the number of n-graphs having a specified number of 
points with no terminating lines, i.e., a specified number of zeros in the vector 
o«. Our random variable, X, the number of zero s;’s, is constant over the third 
order subspaces w(p, «) in the decomposition of w(p); thus, it is associated with 
the sample space structure. Hence, according to Section 4 and the enumeration 
formulas of Section 5, the probability of exactly k isolates is given by 


D> La,(o)n(0, 2) 
P(X = k|p) = “+—_____ 
n(r) 


E Lao) {TT (1 + 53%, 0) 
("7") 





448 LEO KATZ AND JAMES H. POWELL 


where A, is the union of w(p, «) such that the vectors ¢ have exactly & vanishing 
components, and J, is the indicator function for the set A. 

We remark that in some contexts the appropriate sample space might be 
the larger space 2, . However, our enumeration methods will still give us the 
required probabilities necessary to construct the distribution. In this case, the 
probability of exactly k isolates is given by 


p> Las(o)n(o, 0) 
P(X =k|t) = Gee 
. 


> D La(o)A {II (1 + 6)” “e, oh 


3 Ove (o)¢ 
eo _ et 
t 


where the notations are the same as before. 

Thus, the probability distribution can be constructed for any index (pro- 
posed for the study of group structure) which depends only on the number of 
isolates in the group. Another such index, equal to the reciprocal of the num- 
ber of isolates, is given by Loomis and Proctor [4] as a measure of “group in- 
tegration.” 

Finally, we note that neither of the distributions (8) and (9) have been given 
correctly in the literature. 


REFERENCES 


[1] F. Davip anp M. G. Kenpa.u, ‘‘Tables of symmetric functions. Parts II and III,” 
Biometrika, Vol. 38 (1951), pp. 485-462. 

[2] L. Karz, ‘*The distribution of the number of isolates in a social group,’’ Ann. Math. 
Stat., Vol. 23 (1951), pp. 271-276. 

[3] L. Karz anp J. Powe, ‘‘The number of locally restricted directed graphs,’’ Proc. 
Amer. Math. Soc., Vol. 5 (1954), pp. 621-626. 

[4] C. Loomis anv C. Proctor, ‘“‘Analysis of Sociometric Data,’’ Research Methods in Social 
Relations, Part 2, Chap. 17, The Dryden Press, New York, 1951. 

[5] P. V. Suxsatg, “On bipartitional functions,’ Philos. Trans. Roy. Soc. London, Ser. A, 
Vol. 237 (1938), pp. 375-409. 





DESIGN FOR THE CONTROL OF SELECTION BIAS' 


By Davip BLackweELu anv J. L. Hopaszs, Jr. 
University of California, Berkeley 


0. Summary. Suppose an experimenter E wishes to compare the effectiveness 
of two treatments, A and B, on a somewhat vaguely defined population. As 
individuals arrive, E decides whether they are in the population, and if he 
decides that they are, he administers A or B and notes the result, until nA’s 
and nB’s have been administered. Plainly, if E is aware, before deciding whether 
an individual is in the population, which treatment is to be administered next, 
he may, not necessarily deliberately, introduce a bias into the experiment. 
This bias we call selection bias. We propose to investigate the extent to which 
a statistician S, by determining the order in which treatments are administered, 
and not revealing to E which treatment comes next until after the individual 
who is to receive it has been selected, can control this selection bias. 


Thus a design d is a distribution over the set T of the (*) sequences of length 


2n containing nA’s and nB’s. We shall measure the bias of a design by the maxi- 
mum expected number of correct guesses which an experimenter can achieve, 
knowing d, attempting to guess the successive elements of a sequence te T 
selected by d, and being told after each guess whether or not it is correct. The 
distribution of the number G of correct guesses depends both on d and on the 
prediction method p used by the experimenter. We shall consider particularly 
two designs, the truncated binomial, in which the successive treatments are 
selected independently with probability 4 each until n treatments of one kind 


have occurred, and the sampling design, in which all (°) sequences are equally 


likely. We shall consider particularly two prediction methods, the convergent 
prediction, which predicts that treatment which has hitherto occurred less 
often, and the divergent prediction, which predicts that treatment which has 
hitherto occurred more often, except that after n treatments of one kind have 
been administered, the divergent prediction agrees with the convergent pre- 
dictions that the other treatment will follow; when both treatments have oc- 
curred equally often, either method predicts A or B by tossing a fair coin, in- 
dependently for each case of equality. 

We find that among all designs, the truncated binomial minimizes the maxi- 
mum expected number of correct guesses. For this design, the expected number 
of correct guesses is independent of the prediction method, and is 


n+n (Bh / a ~n+ (n/x)"”. 


Received March 26, 1956; revised November 2, 1956. 
1 This investigation was supported (in part) by research grant from The National In- 
stitutes of Health, Public Health Service. 


449 





450 DAVID BLACKWELL AND J. L. HODGES, JR. 


With the truncated binomial design, the variance in the number of correct 
guesses is largest for the divergence strategy and is 


3n/2 — D — D*/4 ~ (34 — 2)n/2e — 2(n/x)"”, 


2 at ; : ‘ 
where D = n ( = / 2°""" and is smallest for the convergence strategy, and is 


n/2 — D’/4 ~ (x — 1)n/2zx. For the sampling design, convergent prediction 
maximizes the expected number of correct guesses; this maximum is 


Wii 2 2 
n+ 2" "7 7 — : ~ n+ (an/4)'”. 


Finally we note that, if treatments are selected independently at random, 
bias of the kind we discuss disappears, but the treatment numbers can no longer 
be preassigned. Three such designs are considered: the fixed total design, in which 
the total number of treatments is a fixed number s, the fixed factor design, in 
which we continue until 1/X + 1/Y Ss 2/n, where X is the number of A treat- 
ments and Y is the number of B treatments administered, and the fixed mini- 
mum design, in which we continue until min (X, Y) = n. For the fixed total 
design, we find that, for s = 2n + 4, Pr (1/X + 1/Y S 2/n) ~ 0.955 for large 
n; at the expense of 4 extra observations, we have a bias-free design whose 
variance factor will with prdébability 0.955 be smaller than that in which treat- 
ment numbers are preassigned. For the fixed factor design, the additional 
number of observations required to achieve the given precision has for large n 
the distribution of the square of a normal deviate. For the fixed minimum de- 
sign, in which we guarantee precision for the estimated effect of each treat- 
ment, the expected number of additional observations is roughly 1.13 (n)"”. 


1. Introduction. It is widely recognized that experiments intended to compare 
two or more treatments may yield biased results if the experimental subjects are 
selected with knowledge of the treatments they are to receive. Consider as illus- 
tration of experiment in cloud seeding. From a sequence of storms the meteorolo- 
gist selects 2n storms deemed suitable for seeding. Of these, n are seeded and we 
compare the rainfalls they produce with those produced by the other n storms. 
If the meteorologist knows (or can guess), while considering the suitability of a 
storm, whether or not the storm will be seeded if he selects it, there exists the 
possibility that his selection will be biased. 

We shall call this selection bias. It presents a serious problem when the trials 
constituting the experiment occur sequentially in time. If it were possible to 
collect at one time a block of as many subjects as there are treatments, a simple 
random assignment of treatments to subjects would dispose of the bias. But in 
many experiments potential subjects occur singly and must be dealt with when 
they arise. For example, in clinical trials it is often essential to treat the patient 
as soon as the illness is diagnosed—the physician cannot wait until he has a 
similar patient merely to permit randomization of the bias. 

In some cases it is possible to eliminate selection bias by conducting the ex- 





SELECTION BIAS 451 


periment in such a way that the person who selects the subjects is not otherwise 
involved in the experiment or is not able to discover which treatments have been 
applied. Again, it may be possible to define subject suitability with precision and 
to accept without conscious selection all subjects meeting the criteria. But often- 
times the exercise of judgment is essential if the treatments are to have a con- 
vincing test, and the best or only judges available are those most deeply involved 
in administering the treatments. Therefore we thought it interesting to see to 
what extent selection bias can be controlied through design of the experiment— 
i.e., through the statistician’s strategy in choosing the sequence of treatments to 
be given to the subjects selected by the experimenter. 

Admittedly, selection bias will usually operate subconsciously, but to sharpen 
the problem we imagine an experimenter EF who is consciously seeking to produce 
biased experimental results, while the statistician S is attempting to prevent this. 
To fix the problem, suppose we wish to compare two treatments, say A and B. 
It is customary to decide in advance of the experiment how many subjects will 
receive each treatment, and it is also customary to assign equal numbers of sub- 
jects to the two treatments. While we shall return to this question below, at first 
let us suppose it given that each treatment will be administered to n subjects. 

If E wishes to make it appear that A produces a greater response X than 
does B, and if he knows (or guesses) that S will assign treatment A to the next 
subject, then E will try to select a subject whose expected response E(X) is 
high. Conversely, if Z anticipates a B treatment, he will select a subject with low 
E(X). The results of E’s guesses and S’s assignments can be displayed in a two- 
by-two table: 


| Number of times when S 
assigns 





|} and E | 4 
| guesses B 
hisconeiciilatehacane 


| 
| 
| 
eed 
| 
| 


Suppose that the treatment effects do not differ, but that when E anticipates 
an A(B) treatment he selects a subject with expested response » + A(u — A). 
Then the expected difference of treatment sums is 


(34) 2A(a + B — n) = 2A(G — n). 


The quantity G is thus the total number of correct guesses. If he guesses at 
random, E would on the average be right half the time, giving E(G — n) = 
His ability to bias the experiment depends on getting E(G) above n. 

In accordance with the foregoing analysis we now formulate our design prob- 
lem as a two-person game. The game is played in 2n moves. On each move, 
each of the players E and S privately selects one of the letters A and B, with 
the restriction that exactly n of S’s choices must be A. They then compare 





452 DAVID BLACKWELL AND J. L. HODGES, JR. 


selections; if they agree S pays F one unit. The total payoff is G, and we wish 
to know S’s minimax strategy for minimizing Z(G), i.e. the optimum design 
for controlling selection bias. The value of the game will indicate to what ex- 
tent selection bias can be controlled through design. 


2. The biases of three designs. Before giving the solution of our game, we shall 
illustrate the ideas by deriving the optimum strategies for E and the correspond- 
ing biases for three designs used in experimental work. 

(i) A very common practice is the alternation of treatments, producing the 
treatment sequence ABAB --- AB or BABA --- BA. While this design is 
exceedingly simple and does an optimum job of spreading the treatments over 
time, it is about the worst possible design from the standpoint of selection bias. 
Since F can correctly guess every treatment, E(G) = 2n. Even if, as is some- 
times done, S selects one of the two patterns at random, EF can guess all but 
the first trial and has half a chance for that, so that E(G@) = 2n — }. (Exactly 
the same conclusions apply to the “Student” sandwich design ABBAABBA 
--» ABBA), 

(ii) As just remarked, S can insure that E’s expectation of correct guessing 
on the initial trial is only } by simply choosing a treatment at random. Further, 
S cannot do any better than this, since E can guarantee himself half a chance, 
whatever S may do, by guessing at random. A similar analysis applies to the 
second trial, and to all trials until one treatment has been given to n subjects. 
At that point the requirement that each treatment be given to just n subjects 
takes over, and the remainder of the subjects must be given the unexhausted 
treatment. We shall refer to this as the truncated binomial design. 

Suppose S has announced that he will adopt the truncated binomial design. 
What should EF do, and how large can he make E(G@)? In the tail of the experi- 
ment, consisting of the terminal sequence of trials having like treatment, EZ 
knows which treatment will be assigned, so he is sure to guess all of these cor- 
rectly—let R denote the number of trials in the tail. We take advantage of the 
fact that L(G) is independent of E£’s strategy except in the tail, and give to Z 
a strategy which simplifies the calculation. Suppose FZ guesses A every time, 
except of course in a tail of B’s. Then G must be at least n (since n A’s are used), 
and may in addition contain a B tail. By symmetry, E(G) = n + E(R)/2. We 
must now discuss the distribution of R. 

To calculate the probability that R = r, notice that this may occur in two 
ways: the nth A treatment, or the nth B treatment, is assigned on the 


(2n — r)th trial. These events have equal probability sa ., em ac- 


cording to the negative binomial distribution. Therefore R has a truncated 
negative binomial distribution, 


(2.1) Pr(R = r) = (?* gen yt rs r= 1,2,---,%. 


n— 1 





SELECTION BIAS 


By calculating E(2n — R), it is easy to establish 


ie 2n 2n—l wm 2(n)” 1 
(2.2) E(R) = n( \/2 a — Kenya t oe 


In a similar way, calculation of E[(2n — R)(2n — R + 1)] yields E(R’) = 
2n — E(R). Furthermore, R(2n)'” has asymptotically the distribution of the 
absolute value of a normal deviate. For example, there is about one chance in 
ten that R will exceed 2.32 n”. Combining with the result of the previous 
paragraph, we see that the value of the truncated binomial design is E(G) = 


n+n (°") / 2°. The excess E(G) — nw n'?/x'” — 1/8(xn)'” + --- isshown 


in Table II for a number of values of n. 

(iii) In the random allocation design, S selects n of the first 2n positive in- 
tegers at random without replacement, and then assigns treatment A to those 
subjects whose ordinal numbers have been selected. Another way of expressing 
this strategy is to say that on each move S selects a treatment with probability 
proportional to the number of subjects still to receive that treatment. 

It is intuitively clear that against this strategy, E should always use the 
convergence strategy, i.e., he should guess that treatment which has previously 
been less used; when there is a tie in past use, S will choose A or B with equal 
probability so Z’s choice is arbitrary. In calculating Z(G) for the sampling de- 
sign it is very convenient to picture the results of S’s choices as a walk on the 
lattice points of the plane. We start at the point (0, 0), and move one unit to 
the right (or up) when S picks treatment A (or B). The experiment terminates 
when we reach the point (n, n). In terms of this walk, E will always guess that 
the walk will move toward the diagonal—if the walk is on the diagonal his 
guess is arbitrary. Since the walk starts and stops on the diagonal, it must 
move towards it exactly n times and away exactly n times. Therefore E’s strat- 
egy assures him of n correct guesses. In addition, there will be a number of 
steps originating on the diagonal, say T of them. 

If we denote generically by B(k) the number of successes in k binomial trials 
of success probability one-half, we see that G = n + B(t) when T = ¢t. Thus 
E(G|T = t) = n + t/2, and E(G@) = n + E(T)/2. 

The distribution of T has been studied by Feller, and it is apparent from 
formula (6.15) of Chapter 12 of [1] that 


(2.3) Pr(T = t) = 2 a ood *) = Spuonue? 


from which it follows that, for large n, T/n"” has asymptotically the distribu- 
tion of an absolute normal deviate. If we consider the probability of the walk 
passing through the point (j, 7), we see that 


(2.4) E(T) = > Cx _— me /tm). 





454 DAVID BLACKWELL AND J. L. HODGES, JR. 


The identity (see [2], p. 252) 


os B)=3)-° 


gives 


2n 2n hold n? 
(26) E(T) =2 (77) -1An it te 
The asymptotic approximation has error of 0.03 per cent at n = 5. 

Table 1 gives some values of E(G) — n computed with the aid of these formu- 
las. Notice that the truncated binomial design has in each case a smaller value 
of E(G) — n than does the random allocation design. This is not an accident, 
as we now proceed to show. 


3. Solution of the game. 

THEOREM 1. The truncated binomial design is the solution of our game. 

In proving this theorem, it is helpful to generalize the problem to permit 
different preassigned numbers of subjects for the treatments. Let D(m, k) de- 
note the design problem when we are required to use A just m times and B 
just k times. By analogy, we say S uses the truncated binomial design if he 
chooses treatments independently and at random until one of the treatments 
is exhausted. As in the special case D(n, n) it is easy to see that E(G) does not 
depend on E’s strategy (provided always that he guesses the obvious in the 
tail) when S uses the truncated binomial strategy. If we denote this invariant 
value of E(G) by ¢(m, k) for the problem D(m, k), we easily find that 


o(m,0) =m; (0, k) = k; 
o(m, k) = [1 + o(m, k — 1) + d(m — 1,k)|/2 form, k > 0. 
For future reference we note that 
(3.1) |o(m — 1, k) — o(m, k — 1)| < 1. 


TABLE 1 
E(G) —n 





Truncated Binomial Design Sampling Design 


1.23 
as 
2. 
2.8 
2. 
3. 
3. 
3. 


1.53 
24 
96 
49 
95 

37 

12 

78 

.63 8.37 
0.564(n)!/ 0.886(n)'/? 


om © Od 


98 


or 


ao 











SELECTION BIAS 455 


This is obvious for m + k = 2, and since ¢ on the line m + k = s + 1 is just 
one-half more than the average of consecutive values on the line m + k = s, 
(3.1) holds in general. 

Our design problems D are inductively related. Suppose we have checked 
our theorem for the design problems D(m — 1, k) and D(m, k — 1), showing 
that the truncated binomial strategy solves these, yielding values ¢(m — 1, k) 
and ¢(m, k — 1) respectively. We now consider the game D(m, k). After the 
first move we shall be faced with one of the former games. Therefore the payoff 
matrix can be expressed in terms of the choices of FE and S on the first move 
only. In fact, the expected payoffs are given by 


s 
A B 
A 1 + o(m — 1, k) o(m, k — 1) 


B o(m — 1, k) 1 + o(m, k — 1) 


Now we hope to show that S should choose the columns with equal probabili- 
ties. Therefore, Jet us try to find a strategy for E which will make these columns 
equally attractive to S. This leads at once to having E choose the first row 
with probability 


E 


(3.2) [1 + o(m, k — 1) — o(m — 1, k))/2. 
(That (3.2) is indeed a probability follows from (3.1)). The game is now solved, 


since (a) when £ uses (3.2), the options are equally attractive to S who is then 
content to choose them at random, while (b) S’s random choice makes £ in- 
different and hence content with (3.2). 

Incidentally, our game has an interesting feature. When either player uses 
his minimax strategy, the expected outcome of the game is independent of the 
strategy adopted by the other player. Notice also that we have shown the 
truncated binomial design to be the solution of the general design problem 
D(m, k), with preassigned but possibly different treatment numbers. 

We remark that although this design minimizes E(G), the minimized value 
is disturbingly large. If we divide the difference of treatment sums (1.1) by 
n'* as is customary in standardizing it, the expected value is about 2A/x"”, 
which does not tend to 0 as n — «. In many experimental situations A could 
be large enough to produce a serious distortion. 


4. The variance of G. When S uses the truncated binomial design, the value of 
E(G) is independent of E’s strategy, but it should not be thought that Z is 
unable to influence other aspects of the distribution of G. For example, if E 
guesses the treatment A, as long as that treatment is possible, he is assured of at 
least n correct guesses, while if he guesses at random, G can be as low as 1. We 
shall in particular examine the influence of E’s strategy on the variance V(G). 
This would be an essential quantity in computing the expectation of a payoff 
function which can be represented by a quadratic function of G, or in approxi- 
mating the probability that the estimated treatment effect exceeds a specified 





456 DAVID BLACKWELL AND J. L. HODGES, JR. 


critical value. (If E believes that the treatments are not different, but wishes as 
large as possible a probability of having the difference appear highly significant, 
he would want V(G) large.) 

Our methods permit us to find the strategy for E which will maximize (mini- 
mize) V(G). We have introduced above the convergence strategy, according to 
which E always guesses the hitherto least frequently used treatment. Opposite to 
this is the divergence strategy: as long as both treatments are available, E guesses 
the one which has been most used; when there is a tie, he guesses at random; while 
as always in the tail he guesses the treatment certain to be used. 

THEOREM 2. Against the truncated binomial design, V(G) is maximized (mini- 
mized when E uses the divergence (convergence) strategy. 

Since E(G) is constant it will suffice to prove the corresponding assertion 
for E(G’) = V(G) + E’(G). Consider the problem D(m, k) where to avoid obvi- 
ous cases we assume mk > 0. Let E employ the pure strategy of guessing A on 
the first trial. If S assigns A, F wins 1 on that trial and is faced with the game 
D(m — 1, k), in which E wins, say, H. If S assigns B, FE wins nothing on the 
first trial and then must play D(m, k — 1), winning K. As the assignments are 
equally likely, 


E(G’) = [E(1 + H)* + E(K’)\/2 = 4 + o(m — 1,k) + E(H’ + KR’). 
Similarly, if Z adopts pure strategy B on the first trial, 
E(G*) = 4 + o(m, k — 1) + E(H’ + K’). 


Now the distributions of H and K depend on the strategies adopted in playing 
D(m — 1, k) and D(m, k — 1) respectively, but not on the guess which FE makes 
in the first trial. Therefore, E(G’) will be maximized when E guesses A on the 
first trial if ¢(m — 1, k) > o(m, k — 1); and when E guesses B if the inequality 
is reversed. As ¢(m, s — m), viewed as a function of m, is an increasing function 
of |m — s/2|, we see that the divergence strategy will maximize E(G’). A 
similar argument shows that E(G’) is minimized by the convergence strategy. 
As argued in Section 2, when E adopts the convergence strategy, and the 
walk has ¢t ties, G = n + B(t). Thus G has as its distribution a mixture of bi- 
nomials, the mixing coefficients being given by the distribution of 7’. We shall 
derive this by considering first the joint distribution of T and R. Denote 


Pr(T =t and R =r) 

by z(t, r), and observe that these variables have the range 3 Si+rsn+1, 
Lote 

It is remarkable that x(t, r) depends only on ¢ + r. This can be seen by estab- 
lishing a two-to-one mapping of the walks with values (¢ + 1, r) onto walks 
with values (¢, r + 1). Consider any walk W with T = t+ 1, R = r. Let W’ 
be the walk identical with W except that the part after the last tie has been re- 
flected about the diagonal; W’ also has 7 = t + 1, R = r. Each of these walks 





SELECTION BIAS 457 


has probability 1/2°"” under the truncated binomial design. Now locate on 
W(or W’) the point immediately preceding the last tie, and denote its coordi- 
nates by (z, y). We shall assume y = xz + 1 with the case zr = y + 1 being 
argued similarly. Suppose W is the walk having its last part above the diagonal. 
We now create from W a new walk W*, by (a) eliminating the step from (z, y) 
to (x + 1, y), (b) shifting one step to the left that part of W from (x + 1, y) 
to the boundary, and (c) closing the gap thus created by adding a step to the 
tail. Note that the correspondence between the pair (W, W’) and W* is one- 
to-one, that W* has probability 1/2°"”’, and that it has T = t, R = r + 1. 

As a corollary we observe that 7 and R are identically distributed. Since 
we have already obtained the distribution of R (2.1), we can now give that of G: 


° t 2n—-t—1 2n—1 
© (G = 9) = ; 
Ped =¢) | n—1 )/2 


V(G) can also easily be calculated. We have 
E(@|T =th=n?+nt+ (t+ ?)/4 
so that E(G*) = n® + (n + 4)ET + ET? = n? + n/2 + nE(R). Since E(G) = 
n + E(R)/2, we get 
(4.1) V(G) = n/2 — E(R)/4 wT n+ 
2r 4a 
This is the smallest value which V(G) can have. 
Since x(t, r) depends only on the sum of its arguments, 


Aes és 


r(t,r) = r(t+r-—1,1). 


A walk which has JT = ¢ + r — 1 and R = 1 must have just ¢ + r — 2 ties 
before reaching the point (n — 1, n — 1). Each such walk has probability }°"~* 
and the number of them can be read at once from (2.3). It follows that, for 
n> 1, 


2n-—-t—r-—2 2n —t—r-—2 — 
2) -_ 3 on 
(4.2) m(t, r) ee eo CS. j 
We shall need E(RT). If we let U, indicate a tie at (k, k), so that T = 
Us os U; a *<2 ot U.-1 » we see that 


— n—l oL 9 a Sh o 
E(RT) = > P(U; = 1) E(R|U, = 1) = Sn—- BW) at = a 
ke k=0 k n—k 
Again making use of (2.4), we find-that E(RT) = n. 
In computing V(G) for the divergence strategy, note that when 7 = ¢ and 
=r,G=n-—t+r-+ B(t). Therefore, 


E@\|T=t,R=r)=(n+r—-)n+r+ + P/. 





458 DAVID BLACKWELL AND J. L. HODGES, JR. 


Using the relations E(7’) = E(R), E(T’) = E(R’) = 2n — E(R), and E(RT) = 
n, we find after simplifying 


1/2 
(4.3) V(G) = 3n/2 — E(R) — E(R)/4 w ZT? - WM Ly... 
2r T 4nr 
This is the greatest value which V(G) can attain. 
Note that the range of V(@) is quite large. The ratio of maximum to mini- 
mum values tends with large n to (84 — 2)/(x — 1) = 3.467. 


5. Completely binomial designs. Since it is not possible to find a design with 
adequate bias control when the treatment numbers are preassigned, we shall 
now examine some designs free of this restriction. In the present section we 
shall assume that each subject has probability } of receiving each treatment, 
and that the assignments are independent. Such completely binomial designs 
are bias-free, in the sense that every guessing strategy will produce a G whose 
expectation, given the number s of trials, is exactly s/2. We can still exercise 
a measure of control over the experiment through the decision to terminate it. 
In our geometrical picture, the design of the experiment now consists in specify- 
ing a set of points in the plane at which experimentation will stop. Each such 
sequential stopping rule will provide a distribution of the numbers X and Y 
of subjects receiving treatments A and B, respectively, leading to distributions 
of the total number of trials X + Y = S and of the variance factor 
1/X +1/Y = V. 

(i) Fixed total design. In some experiments it may be necessary or desirable 
to know in advance the total number s of trials to be performed. This leads to 
the stopping rule x + y = s, for which the variance factor V is variable and 
indeed unbounded: if z or yis0, V = «. However, since X = B(s), if s is large 
it is unlikely that X will be far from s/2 and V will probably not much exceed 
its minimum value 4/s. In fact, if we expand V in powers of (x — s/2), we find 


4. ms... 0¥ 


Here 2(X — s/2)/s'” is approximately distributed as a normal deviate. If 
K, denotes the upper a/2 point on the normal distribution, and if we want to 
have V S 2/n with probability 1 — a, we should choose s so that 


24,452 
n § 


2 


s 


or s = 2n + K%. For example, if we set s = 2n + 4, we shall be for large n 
about 95.5 per cent sure of obtaining a bias-free experiment with variance 
factor smaller than that obtainable with X = Y = n preassigned. 

(ii) Fixed factor design. Instead of fixing S and permitting V to vary, we 





SELECTION BIAS 459 


might often prefer to fix V and permit S to vary. For example, we could con- 
tinue taking observations until X and Y satisfy 
1 1 2 

5.1 ~. a 
(5.1) X oe 
Write X + Y = S = 2n + U, so that U = O may be viewed as the number 
of additional observations required to obtain freedom from bias. 

For a given value U = wu, let x, denote the smallest value of X for which 
(5.1) holds; i.e., 

. l 1 2 1 l 
(5.2 a> gi dei Gulag 
6-4) - n Lu — hte 2 
A path will yield U sS wu if and only if, at the (2n + u)th step it 
at, say, (X*, Y*) with x S X* S y,. As X* = B(2n + w), distribution of 
is now easily calculated. 

As n— , U has a simple limit law. From (5.2) it appears that for large 
Yu ~ (n + u/2) + u?(2n + u)“”/2. Since the binomial X* has EX* 
n+ u/2 and oye = (2n + u)'”/2, we have 


Lu + Yy = 2n + u. 


Pr(U S u) & Bu”) — &(—u"”), u=0,1,2,---. 
Table 2 compares the distributions of U for n = 5, 10, 20, and «. We see that 
on the average it costs about one and one-half observations, and is practically 
certain not to cost as many as ten observations, to eliminate selection bias 
entirely. (Even this comparison is unfair to the bias-free design, as the ine- 


quality (5.1) is usually strict and we are obtaining a somewhat more accurate 
estimate. If we were to take the final step with a probability adjusted to make 
E(1/X + 1/Y) = 2/n, we should find EU — lasn—- ~.) 


TABLE 2 
Fixed factor design 
Distribution of U = excess observations required 





PU su) 





10 
.176 
.617 
.866 
.907 
.936 
.985 
.991 
.994 
.996 
.998 
.999 
1.464 


CMONOarPWNeK OF 








460 DAVID BLACKWELL AND J. L. HODGES, JR. 


In practice, the fixed factor design would be used in a truncated form. For 
example, we could continue the binomial choice until (5.1) holds, or until 
X + Y = 2n + u. If the latter eventuates first, the deficient treatment could 
be applied until (5.1) holds. By setting « = 10, we would have practical cer- 
tainty of a bias-free design, without the theoretical possibility of an infinite se- 
quence of trials. 

(iii) Fixed minimum design. In some cases we might wish to guarantee the 
precision of estimation for each treatment effect separately, rather than for 
their difference. We should then need 


min (X, Y) = n. 


By symmetry, we are equally likely to stop at (n, x) and at (z, n), so it will 
suffice to consider the probabilities of stopping at points (z, n) for z = n,n + 
1, --- . These probabilities are easily seen to be proportional to those of the 
single negative binomial design, which is stopped by y = n. Thus our X has a 
truncated negative binomial distribution, with range just the complement of 
that of n — R considered in Section 2. As each of the ranges is equally likely, 
we must have 3ES + 3E£(2n — R) equal to the expected number of steps in 
the single negative binomial, which is 2n. Therefore ES = 2n + ER, where 
E(R) is given by (2.2). Roughly, we must expend on the average 1.13 n“” addi- 
tional observations in this case. 


6. Extensions. A good deal of the preceding argument generalizes rather easily 


to experiments involving more than two treatments. In particular, the minimax 
design for preassigned treatment numbers consists in choosing at each step 
among the remaining treatments with equal probabilities. A simple bias-free 
design, which generalizes 5(ii), consists in choosing a treatment at each step, 
with equal probabilities, and terminating the experiment when the sum of recipro- 
cals of treatment numbers falls below a preassigned level. 


REFERENCE 


{1] FeLver, WiLLiaM, An Introduction to Probability Theory and Its Applications, Vol. 1, 
John Wiley & Sons, 1950. 





ON INFINITELY DIVISIBLE RANDOM VECTORS! 


By Meyer Dwass AND Henry TEICHER 
Northwestern and Stanford Universities; Purdue and Stanford Universities 


1. Summary. A normally distributed random vector X is well known to be 
representable by A-Y (in the sense of having identical distributions), where A 
is a matrix of constants and Y is a random vector whose component random 
variables are independent. A necessary and sufficient condition for any infinitely 
divisible random vector to be so representable is given. The limiting case is 
discussed as are connections with the multivariate Poisson distribution and 
stochastic processes. 


2. Notation and preliminaries. Let (Q, ®, ®) be a probability space; that is, 
Q is an abstract point set, ® a Borel Field of subsets of 2, and @ a probability 
measure defined on @. If m = 1 is an integer and X, Y, Z --- a set of m-dimen- 
sional vectors defined on 2, we write X ~ Y to signify that the associated proba- 
bility measures (or distribution functions) of X and Y are identical. Since the 
relationship indicated by ~ is reflexive, symmetric and transitive, the use of this 
symbol is in the best of taste and tradition. 

We abbreviate the terms cumulative distribution function, characteristic 
function, random vector, and infinitely divisible by c.d-f., c.f., r.v., and i.d., 
respectively, and occasionally string some of these together. A bar over a set 
signifies complementation and the notation R™ is used for m-dimensional Eucli- 
dean space. 


3. Infinitely divisible vectors. Recall that an r.v. X and likewise its c.df., 
say F(x, , 22, --* , tm), anditsc.f., say g(t, te, «++, tm), are called id.” if for 
every positive integer n, X ~ sum of n independent (identically distributed) 
r.v.’s. P. Lévy ([4], p. 220) has given a necessary and sufficient condition (NSC) 
that X be i.d., viz., 


[.< i< 
g(t, +++, tm) = exp i D vith — = De omtite 
2 jet 


j=l 


' i(0) Uy +--+ +l yt) a(t Uy + +++ + tm Um) ry \ 
+f a [emrnene — 1 = Rae Fat Yavin + a} 


Received May 28, 1956; revised November 12, 1956. 

' This work was supported in part by an Office of Naval Research contract at Stanford 
University. 

? In many works X is defined to be i.d. if for every positive integer n, X, = Xai + Xn2 
+--+» + Xan ,where Xni ,Xn2, +++ , Xan are mutually independent. Sucha definition places 
demands on the basic space 22. A discussion of this point occurs in Appendix 2 of “Limit 
Distributions of Sums of Independent Random Variables”? by Gnedenko-Kolmogoroff, 
Addison Wesley. The current definition obviates such questions. 


461 


1) 





462 MEYER DWASS AND HENRY TEICHER 


where 7; are real coefficients 2 = {o;;} is a positive semidefinite matrix, | u | is 
the Euclidean length of the row vector u’ = (wu, U2,°**, Um) and yy(A) = 
SJadN(uw,-+--, um) is a nonnegative additive set function (not necessarily 
finite) defined on the Borel sets A of R™ and such that 


(1, [ |u?| dN(u, +++, Um) < ©, [ dN(u,°**, Um) < ©, 


(with S, an m-dimensional sphere of radius « > 0 centered at the origin and S, 
its complement in R™) for arbitrary e. 

Let g(t), N(u), and G(u) abbreviate o(t,,---, tm), N(w,-°-*, Um) and 
G(u , «++ , Um), respectively. Analogous to the one-dimensional case, an alterna- 
tive and frequently more convenient form of (1) is given by 


g(t) = exp {iy't — 4t’Z6} 


‘ oo{ (071 ep) (EP) 0} 


where t, u, y are column vectors and 


m" " | u |? 
ug(A) = [ dG(u) = 5 T+ ]upo™ 
is, in view of (1’), a finite Lebesgue-Stieltjes measure on the Borel sets of R™ 
which may be taken to vanish for A = {u: |u| = 0}.*° Thus, any i.d.c.f. may be 
characterized by a triplet [y, 2, G].* 

The first factor in (2) is obviously the c.f. of a multivariate normal distribution, 
while the second is generated by the Poisson distribution in a sense which will 
be made more precise later. Thus, every id. vector X ~ X” + X” where X” 
is multinormal and independent of X® which will be said to be “Poisson type.” 
It will be convenient to refer to this as the canonical representation of X. If 
xX® = 0, X will be called “purely normal” while if X” = 0, X will be dubbed 
“purely Poisson type.”’ 


r 


If ani.d. vector X = ( is partitionable into subvectors, one of which (say 
U) is purely normal and the other V purely Poisson type, then U and V must be 


ql) (2) qd) 
independent. For (7) ~ v0) + y ) with (yo) purely normal and 


independent of v2) which is purely Poisson type. But U purely normal 


implies U” = 0 and V purely Poisson requires V" = 0. The assertion follows. 
This observation may be utilized to construct a non-i.d. vector, all of whose 
marginal random variables are i.d. 


* For the most frequently encountered case m = 1, this G(u) is not in general identical 
with that used by the authors of the book mentioned in Footnote 2. 

* It does not appear to have been remarked (even for the case m = 1) that a bounded r.v. 
is id. if and only if it isconstant with probability one. This may be argued directly from the 
definition without resorting to (2). 





RANDOM VECTORS 


The fact that for an arbitrary nonzero constant vector 


c= (1, ¢2,°*+,¢m) andcf. gx,.x,,---.x,(t, te, -** , bm); 
e(crt, Cot, «++ , Cmt) = Elexp{i(erX1 + --+ + CmXm)t}] = gex(t) 


shows immediately that if X is an i.d. vector, every linear combination c’X of 
its component random variables is i.d. The converse, however, is untrue. That is, 
it is possible for every linear combination c’X to be i.d. without the vector X being 
i.d.* The Wishart distribution provides an example. 

For the c.f. of a so-called I'-variable is well known to be [1 — (it/a)]”, 4, a > 0 
and is manifestly id. Hence, the c.f. [(1 — it;)(1 — it,)J”, \ > 0, of the sum of 
two independent I-variables is clearly i.d., whence by the remarks at the begin- 
ning of the preceding paragraph, [1 — i(c; + c2)t — (Cot |* is id. for arbitrary 
constants ¢; , c.. But if Z; = (Z,;, Z2;) are independent normally distributed 
vectors with mean vector zero and common covariance matrix 2, then X’ = 
(Xi, X2, Xs) = (O09 Zi; DO? Ze;, SP Z1;Z2;) has the Wishart distribution 
with c.f. 


i- & es = [1 — 2i(outi + orate + orsts) + 4(ono2 — oi) (ts . tte)”. 
Thus, every linear combination b’X has the c.f. 
[1 — 2t(bion + boo + 2bso12)t + 4(onoe — ai2)(b5 cA biba)ty™”, 


which is i.d. by the preceding remarks. On the other hand, P. Lévy has shown 
[5] that the Wishart distribution is not itself i.d. and that for n = 1, it is even 
indecomposable. 

If Y is a k-dimensional r.v. whose component random variables are inde- 
pendent and i.d. and A is an arbitrary m X k matrix of real constants, X = AY 
is an i.d. r.v. In what sense is the converse true? That is, if X is an i.d. vector 
when does there exist a constant matrix A and a finite set of independent i.d. 
random variables Y; , --- , Y; such that X ~ AY? 

If X is purely normal it is well known that such a representation is always 
possible. Thus, it suffices to investigate X”,, the Poisson type r.v. in the canonical 
representation of X. For if X° ~ A,Y® with the kz components of Y® mutually 
independent, since X“”? ~ A,Y™ with the k, components of Y® independent. 
we will have 


y” 
X~ X® + X® Ww (A), As) (J) = AY, 

with the k = k, + kz components of Y independent random variables. 

An answer to the question posed is given by 

THroreM 1. A NSC that a Poisson-type rv. X ~ AY where the components 
Y,,-+:, Ynof Y are independent non-degenerate i.d. random variables and A is an 
m X k matrix of constants, no column of which consists entirely of zeros, is that in 
(2), ue vanish identically except on k different rays through the origin. Then k is 


5 It is presumed that the assigned distributions of all linear combinations are compat- 
ible with the existence of a joint distribution. 





464 MEYER DWASS AND HENRY TEICHER 


the minimum number of random variables for which a representation X ~ AY, with 
the components Y; , --- , Y,x independent Poisson-type random variables, is possible. 

Sufficiency. From the hypothesis of Theorem 1, the c.f. of the i.d. vector X 
may be supposed characterized by [y, 2, G], with 7’ = (0,--- , 0), = = {0}, 
ie., X” = (0,---, 0). By hypothesis, wg assigns positive mass only along k 
rays, say R,;, whose direction cosine vectors are rj = (rij, T2j, °** » Tmj). Let 
G(s) = wo(A}), where Aj = {u:ueR”, u’ = pr;, —© <p < s}andA(t’'u, | u]) 
denotes the integrand of (2). Then 


k 
[. h(t'u, | w|) dG@(u) = » y h(pt'r;, p) dG,(p) 


a ire yg i(t'r;)p ( + r) (o)s 
i ex(t) = I exp [, (« ae | + e) x dG 0) 


’ 
- II et 4), 
j=l 


where ¢;(¢) is a univariate i.d. c.f. characterized by [0, 0, G;). 
Let Y,,---, Ys be independent i.d. random variables with Y; having c.f. 


¢;(t) as defined in (3) and take A to be the m X k matrix whose jth column is 
r;. Then if Z = AY, 


$23 ,-++,2m(t1, ***, tm) = E [exp {it'Z}] = E [exp {i(t'A)Y}] 
(4) 


k k 
= I] E [exp {i(t'r;)¥;}) = I ei(t'r;). 


Thus, X ~ Z = AY. This representation in terms of the distributions of k inde- 
pendent i.d. random variables (k being the number of rays with positive mass) 
is unique to within a relabelling of the variables and multiplication of each vari- 
able by a nonzero constant. The columns of A must then be adjusted accordingly. 

Necessity. Since X ~ AY with the components of Y independent and non- 
degenerate, the first equality of (3) holds with r; equal to the jth column of the 
given matrix A = {a;;} divided by the scalar norming factor (>_ 7-1 a7;)/?. Com- 
paring this with (2), it follows from the uniqueness of the i.d. representation 
that G and ye are as stated in the theorem. 

Note that if k < m, the mass of the distribution of X° is concentrated in a 
space of lower dimensionality than R” (i.e., the distribution of X is singular). 

A family of distributions § = {F} has been defined in [7] to be factor-closed 
if F = G, * G,, F ¢ implies G, , G, e ¥. Then we have as a 

Coro.iary. If some component X; of X® has a distribution belonging to a 
factor-closed family F, the distributions of r;;Y ; belong to $, for 7 = 1, 2, --- , k. 

To avoid trivialities, let all components X; of X be nondegenerate. Then no 
row of A is a zero vector. If X° = 0 and m = k, then the components of X 
(ie., X™) are independent if and only if the rows of A may be permuted so as to 





RANDOM VECTORS 465 


form a diagonal matrix. This is palpably sufficient; on the other hand, if A can- 
not be so juggled, some Y; has nonzero coefficients in two (necessarily inde- 
pendent) linear forms in the independent random variables Y; . But this implies 
[2] that Y;is normally distributed. The proof of the theorem, however, shows that 
when X“ = 0, the Y; are all purely Poisson type, producing a contradiction. 

Let a; be a k-tuple with 1 in the jth position and zeros elsewhere, j = 1, 
-++ , k. Then if k S m and all a; belong to the m-manifold spanned by the m 
rows of B, no representation of an i.d. vector X in the form BZ with the k com- 
ponents of Z independent but not all i.d. random variables is possible. For in 
such a case C}B = a; has a nonzero solution C; for all 7 = 1, 2, --- , k whence 
C;X ~ CiBZ = a;Z. But C}X and therefore Z; is id. If, e.g., k > m such a 
representation is not summarily precluded. 

It is, in general, untrue that an m-dimensional random vector Y ~ AX where 
the components of X are independent random variables and the matrix A is 
m X k, This may be seen with the familiar multinomial distribution. 

Example. In r independent repetitions of an experiment, let Y;,--- , Ymas 
be the number of occurrences, respectively, of the mutually exclusive and ex- 
haustive outcomes A,,---, Am4: with (single) trial probabilities p,, --- 


, 


Dns « cas pi = 1); take Y’ = (Y,, Y2, --- , Ym) and suppose there exists an 
integer k => 1 and constant vectors a; = (aj, --- , @) such that Y ~ AX with 
the components of X independent random variables. Then 


k m . m r 
5) Ted apt 2 b pe — 1) + 1], 


von! 


Setting t, = t,4, = Oforv + u, 
k 
I $j(Ajpt) am [p,(e" a 1) > iy’. 
ta 
Since the classical binomial family is factor-closed [7], 
¢;(a;,t) = e“**[p,(e* — 1) + 17 with 0 <r; Sr, 
Di = We ><; ” 0, 


si . 
stez/ajy 


i(t) = e [pule’** — 1) + IY, j=1,2,---,k. 


Since the left-hand side is independent of yu, so is the right-hand side, whence 
Pu = P, Aj = a;,u = 1,2, --- , m. Thus, if the multinomial probabilities are 
not identical, (5) cannot hold. However, even if p; = p, (5) would imply 


Ie,(E at) = I1| p (exp j@x | - 1) + 1] 


= [= ple” — 1) + i], 





466 MEYER DWASS AND HENRY TEICHER 


which is impossible since the middle expression is a function of )°7.1 ¢; only 
(and hence a degenerate multivariate distribution) whereas the right-hand side 
is not. 

The following theorem covers the case that the measure ye is not necessarily 
wholly concentrated on a finite number of rays through the origin. 

Tueorem 2. If X is an i.d. random vector, then there exists a sequence of vectors 
{Y.} consisting of independent i.d. components, and a sequence of finite matrices 
{A,} such that the distribution of A,Y, converges to the distribution of X asn— ~. 
The components of Y, can each be taken to be of the form (Y — b), where Y is a 
Poisson random variable if X is purely Poisson type. 

Proor. As earlier, we may suppose X“ is zero. Let 


ae 1+ jul’ 
mw) = (e"* 1 — EB) (Fae): 


There is a double sequence of positive constants 


An,1 cs An, k(n) ? 


and a double sequence of m-tuples, 


Uni» °*** » Un,k(n) » 


, a (1) (m) 
Uni = (Uns y*** » Und ds 


such that 


k(n) 


6) 2 Aa.sli(uns) > ff Cu) du, 


i=l 
asn — ©, Now An, (tn,s) is the log of the c.f. of a random vector 


(7) ({Yns — bnclus? , --- , [Yue — bnclus?? 


nt , 


where the b,,; are appropriately chosen constants, and Y,,; is a Poisson random 
variable with parameter X,,;. Hence, the left-hand side of (6) is the log of the 
c.f. of a sum of k(n) vectors of the form (7), where Yui, --- , Yneq@ are mutually 
independent. In other words, the left-hand side of (6) is the log of the c.f. of the 
vector 


(1) ql) 
Uni Un,k(n) 


| 
| 
| 


(m) (m) 7 
Uni Un,k(n) l aie! 


where Y,; = [Yai — bad, i = 1, --- , k(n) are mutually independent i.d. random 
variables. This completes the proof. 


4. Multivariate Poisson distribution. Let V denote the set of 2” — 1 vertices 
(excluding the origin) of the unit cube in the first orthant of R” and lying along 





RANDOM VECTORS 467 


the m-axes; let V ; signify the vertex with one as the jth coordinate and zeros for 
the others, 7 = 1, 2, --- , m; let Vi;, 7 < j, represent the vertex with one’s for 
the ith and jth coordinates but zeros for the remaining; --- ; finally, let V;2. 
denote the vertex (1, 1, --- , 1). 

In (2), define G(u) by we(V;) = a; 2 0,7 = 1, +--+, m, wel Viz) = ay; 2 O, 
i <j;+-+,me(Vis,..-.m) = .2,.... 2 0, and for any Borel set B of R”, uo(B) = 
uc( VB) where the measure of the empty set is zero. Then if z; = e*'’, (2) becomes 


“ ™ m 
(8) g(t) = exp 4 >. ajz; + >. Qij2¢23 + °°* + Qh.2,---.m LI 2; _ An} 


jon} i< jal 


™ 


where A,, is a constant such that ¢(0, 0, --- ,0) = 1. The c.f. in (8) is that oi the 
multivariate Poisson discussed in [6]. Since G(u) is of the form prescribed by 
Theorem 1, with k = 2” — 1, it follows from this theorem (supposing the con- 
stants a;, @j;,°-** , Qi2,-.-. Strictly positive) that there are 2" — 1 random 
variables Y ; and a constant matrix A such that X ~ AY. The matrix A may be 
chosen so that its 2” — 1 columns are the vectors (vertices) of V. By the corollary 
to Theorem 1, the Y; are also Poisson distributed with parameters a; , a2, --- 
Gm 3 yg, 413,.--.@m—i.m 3 °** 3 @1.2,---.m. Since the classical Poisson distribution is 
not invariant under scale change, the matrix A is uniquely determined to within 
a permutation of its columns by the stipulation that the Y; be independent 
Poisson random variables. 

Furthermore, the multivariate Poisson distributions specified in (8) are the 
only i.d. distributions which are marginally Poisson. For, under this last proviso, 
G(u) in (2) must be such that the projection of uo on the jth coordinate axis con- 
centrates all mass at the point (0, 0, --- , 0, 1,0, --- , 0). This, in turn, requires 
that ue be as defined in the previous paragraph. 

More generally, let 5 = {F(z; bi, +--+, b+; :, °°: , ce} be a family of uni- 
variate i.d. distributions whose c.f.’s are characterized by [0, 0, G] with ue a 
discrete measure assigning mass c, > 0 tou = bh ~ 0,h = 1,2,---,4r. Let 
X be an i.d. vector with the prescribed marginal distribution Fy,(z) = 
F(z; bi, ---, be 3 ch, -*:, ch), 7 = 1, 2,---, m. Then, as earlier, there is a 
unique family of i.d. distributions for X having the stated marginals. Its c.f.’s 
are characterized by [y, =, Ga] where 7’ = (0,0, --- ,0), = = {0}, and we, isa 
discrete measure assigning mass dj; = 0 to the (r + 1)” — 1 points 
(wu , U2, °** , Um) Where u; = b; or 0 (but u; not all zero). Here the independent 
random variables may be taken to have the classical Poisson distribution and 


k 
[Dau | 
j=l 
; | a 
7 = 
be vales stage) 
> du ¥ ding ¥ 5 


1 


(dij Y; ) 
oe 





468 MEYER DWASS AND HENRY TEICHER 


It is degenerate vectors of this form based on a single Poisson random variable 
Y; rather than nondegenerate vectors having the most general multivariate 
Poisson distribution that spawn i.d. vectors. 

It was pointed out above that if X is a multivariate i.d. vector, all of whose 
components have Poisson distributions marginally, then X must have a multi- 
variate distribution specified by (8) and X ~ AY when A isa finite matrix and Y 
is a vector of independent Poisson random variables. The purpose of the next 
remarks is to indicate that in general a comparable situation does not prevail. 
For example, suppose that U; , U; , U; are independent gamma variables whose 
c.f.’s are all (1— it), (A > 0). Then 


a < 4h OO lee 
-s (= 7 (ei) : 
‘3, 
is an i.d. vector with c.f. {(1 — it,)(1 — it)[1 — a(t; + &)]}~* whose marginals 


X, , X2. are gamma variables. On the other hand, in [3] it is shown that if | p | < 1, 
then 


(9) ((l — it:)(1 — it) + p'trtey” 


is a c.f. for all \ > O (and hence i.d.) and its marginals clearly have the same dis- 
tribution as do X, and X,. Thus there is no unique i.d. family having gamma 
marginals. Suppose p ~ 0 to avoid the trivial case of independence; then it is 
easy to verify that (9) cannot be the c.f. of a finite linear combination of in- 
dependent gamma variables. 


5. Connection with stochastic processes. It is a familiar fact that in the one- 
dimensional case the theory of i.d. random variables has a close connection with 
the theory of stochastic processes with independent increments. The analogue for 
multivariate i.d. vectors should be apparent, but it may be worth making some 
of the facts explicit. 

Let U be a random vector whose values are the vectors of the set V defined at 
the beginning of Section 4. Denote these values by wu, --- , u, and let their 
corresponding probabilities be p; , --- , px , where k = 2” — 1. Let U1, U2, ---, 
be an infinite sequence of independent random vectors, each distributed as U. Let 
X'(t), (t 2 0, X’(0) = 0), be a Poisson process with stationary, independent in- 
crements. It is well known that waiting times for “jumps” in X’(¢) are inde- 
pendent, identically distributed exponential random variables. That is, an 
equivalent way of defining this process is in terms of an infinite sequence of in- 
dependent, identically distributed random variables W,, W:,--- , such that 
P(W; < w) = dfo ew dy for w > 0 and zero otherwise (A > 0) as follows: 


x’'0) = 0,0 Sts W,, 
X(t) =1,Wi<tsWit+ We, 





RANDOM VECTORS 


X(t) = 2,Wit+ W.<t<s Wit W.+ Ws, 
etc. Analogously, we can now define a multivariate Poisson process X (¢) as follows 
X(0) = 0, (zero m-vector),0 St Ss Wi, 
X(t) = U,,Wi <ts Wit Wz, 
X() = U,+02:,Wit+W:<tsWi+ W.+ Ws, 


etc. Making use of the well known fact that the conditional distribution of 
W,,---, W,- given that X’(t) = ris that of the ordered values of r independent 
random variables, each uniformly distributed in (0, t), it is easy to compute that 
the c.f. of X(t) is 


> Ico} aa aye 


j=l 


where 


C(0) = C(,, --+ , Om) = >, ep; 
j=l 

is the c.f. of the random vector U and u , --- , u,is the set of the k possible values 
of U. Making use of the material in Section 4, we see that we can choose the 
p;’s and d so that X(t) has any prescribed i.d. multivariate Poisson distribution. 
We remark also that X(t) has independent, stationary increments for exactly the 
same reasons that X’(t) does. 

Consider now the somewhat more general case in which U, , U2, --> are in- 
dependent, identically distributed random vectors (m-tuples) having an ar- 
bitrary distribution with c.f. 


C(@) - | e’'™ dF(u), 
R 


m 


where F is the distribution function of U,. If we define X(t) as above but in 
terms of these more general U,’s, then the c.f. of X(t) is 


exp {Mt | (*”™ _ 1) dF(u). 
Rm J 


We recognize this to be a multivariate i.d. c.f. either from the Lévy form or from 
the fact that X(t) has independent increments. We cannot obtain the most 
general multivariate i.d. c.f. in this way. On the other hand, we can find a se- 
quence of constant vectors a; , a2, +--+ and scalars b, , bg, --- and distribution 


functions F, , F,, --- such that if X,(¢) is determined by F, as above, then as 
n-—- ®, 


(Xn(t) — an)/dn 





470 MEYER DWASS AND HENRY TEICHER 


converges in law to an arbitrary Poisson i.d. vector. Thus, the most general 
Poisson-type i.d. vector can be approximately obtained in terms of a Poisson- 
like stochastic process with independent exponential waiting times between 
“jumps” and whose “jumps” are random vectors. 


REFERENCES 

[1] Satomon Bocuner, Harmonic Analysis and the Theory of Probability, University of 
California Press, Berkeley and Los Angeles, 1955. 

[2] G. Darnors, ‘Sur une propriété caractéristique de la loi de probabilité de Laplace,’’ 
C. R. Acad. Sci. Paris, Vol. 232 (1951), pp. 1999-2000. 

[3] A. S. KrisHNAMOORTHY AND M. PartTuasaratTuy, “A multivariate gamma-type distri- 
bution,” Ann. Math. Stat., Vol. 22 (1951) pp. 549-557. 

[4] Paut Ltvy, Théorie de l’Addition des Variable Aléatoires, Gauthier-Villars, Paris, 
1937, pp. 214-221. 

[5] Pauu Livy, ‘“‘Arithmetical character of Wishart distribution,’’ Proc. Cambridge Philos. 
Soc., Vol. 44 (1948), pp. 295-297. 

[6] Henry Tercuer, ‘On the multivariate Poisson distribution,’’ Skand. Aktuarietids. 
(1954), pp. 1-9. 

[7] Henry Tercuer, ‘On the factorization of distributions,’?’ Ann. Math. Stat., Vol. 2: 
(1954), pp. 769-774. 





A MOVING SINGLE SERVER PROBLEM 


By B. McMiILuan anv J. RiorpAN 
Bell Telephone Laboratories, Incorp., New York 


1. Introduction. An assembly line moving with uniform speed has items for 
service spaced along it. The single server available moves with the line while 
serving and against it with infinite velocity while transferring service to the next 
item in line. The line has a barrier in which the server may be said to be “‘ab- 
sorbed”’ in the sense that service is disabled if the server moves into the barrier. 
The problem solved here is the following: given that a server with exponentially 
distributed service time starts service on the first item when it is T time units 
away from the barrier, what is the probability p(k, T’) that it completes k items 
of service before absorption? This is the same as determining the generating 
function 


(1) P(z,T) = > p(k, T)z*. 
io 


The referee has pointed out to us an identification of this problem with that of 
finding the number of units of service in a busy period for the usual (stationary) 
single server. This may be seen as follows. 

Take r(t) as the distance from the barrier at time ¢t, so that (0) = 7’. Take the 
spacing between items as an independent random variable with distribution 
function B(t). Then the graph of 7(¢) as in Fig. 1 consists of lines of unit slope 
interrupted by jumps having the distribution B(/) and occurring at t-epochs de- 
termined by the exponential distribution of service time. The graph ends when 
r(t) = 0 for the first time, when service is disabled. 

Now consider the queueing system with a single server, Poisson arrivals, and 
distribution of service times B(t). Take r(t) as the waiting time of a virtual 
arrival at time t. Then the graph of 7(¢) for a single busy period of the server is 
exactly as in Fig. 1 if the first customer served has a service time which is given 
to be T. 

Note that one problem is turned into the other by interchanging service and 
arrival variables. 

Busy periods were first considered by E. Borel [2] for the case of constant 
service time and with main interest in the number served, exactly as here, but 
with the first customer’s service time the same constant as all others. Turning to 
the length of the busy period, D. G. Kendall [4] generalized Borel’s result to 
arbitrary service time distribution by transforming it into a question concerning 
a branching process. Kendall’s functional equation was carefully derived by L. 
Takacs [7], who also obtained a similar equation for the generating function for 
the number served in a busy period (with no condition on the first customer) for 


Received July 5, 1956; revised October 5, 1956. 
471 





B. MCMILLAN AND J. RIORDAN 


a 


~ 
J 


Fig. 1. Sample behavior of random variable r(t) 
r(t) distance from barrier at time t (moving server) 
waiting time of a virtual arrival at time ¢ (queueing system) 


arbitrary distribution of service time. All of these are under the usual assumption 
of Poisson arrivals. 
Takacs’ result (1.c. Theorem 7, p. 120) in present notation is as follows: 
THEOREM (Takacs). Jf the generating function of number served in a busy period 
is 


F(x) = x | P(x, T) dB(T), 
“0 


and if arrivals are Poisson with average a in unit time, then 


6) F(z) =x] exp [—at(1 — F(zx))] dB(2). 


0 


This suggests that the conditional generating function P(z, T) satisfies 


(4) P(2, T) = exp 4 —aT E —2 i P(x, T) apr) | ‘ 


\ ° } 


which deserves an independent derivation. It is clear that it is not a simple con- 
sequence of (3), since in the case of constant service time equal to e 


F(x) = zP(z, «) 


and cannot possibly determine P(x, T) for arbitrary 7. Nevertheless Eq. (4) is 
correct. 

Because of this, we retain our original derivation of P(x, 7’) which is limited to 
the two extreme cases (of most interest to us), namely (i) constant spacing 


(5) Bit) = 0, [a ¢ 
iz > 
and (ii) random spacing 


(6) 





SINGLE SERVER PROBLEM 


=l1-e" 4¢>0, 


for both of which we take the service distribution as 
(7) A(t) = 0, t< 0, 
l-—e™, t> 0. 


The average service time is 1/a. 
We show that (4) is true in both cases and obtain explicit expressions for the 
probabilities p(k, 7) and for their moments. 


2. Uniform spacing. The probability p (0, 7), that service on the first item 
begun 7’ time units away from the barrier is not completed before absorption, 
is the probability that the service time is greater than T; hence 


(8) pO, T) = 1 — A(T) = &*’. 


For the other probabilities p(k, 7’), h 1, 2, --- , a recurrence may be found 
as follows. Suppose service on the first item is completed in the interval ft, ¢ + 
dt; then service is begun on the second item at a point ¢ + ¢ time units away from 
the barrier, and it follows at once that 


-T 


p(k, T) = | plk — 1,t+ ) dA(T — 


“0 


~7 


| Dik — ltt Rests, dt. 


“0 


Then, using Eq. (1), the generating function P(x, T) must satisfy 


T 
(10) P(z,T) = e€ Tis | P(x, t + ae a 
0 


Suppose that this has a solution e“”, \ = A(x; a@, 6); then (10) shows that 
(11) e 7 -e™ axre “*(e*" — &*")/(a — d), 
or 
(lla) aQ-~-Aa = ax 


But this is what (4) becomes when B(t) is given by (5) and P(x, T) = e*’. 

Notice that for x = 0, \ = a, as is required by (8). Note also that all proba- 
bilities p(k, T), k = 1, 2, --- are uniquely determined by (8) and (9), and that 
P(x, T) is an analytic function for « < 1. To determine it rewrite (lla) in the 
form 


(ae—A€) 
(wae — Alle 


or, What is the same thing, 





474 B. MCMILLAN AND J. RIORDAN 


This is an equation familiar in Lagrange series expansions and in fact the ex- 
pansion for exp (27'/e) = exp (a7 — XT) is given by Pélya and Szegé [5] (III 
Abschnitt, p. 210) in the form 


exp (aT — xT) = 1+ eee 
or 


co n(n 4 ..\e-1 
(13) exp -AT = € 7" + > TT + ke) (ae **)*e 27 x". 


kenl k! 


Hence 


mn k—1 
(14) pk, T) = Me Alay (ae **)*e* 
a result which may also be obtained from (9) and mathematical induction. 
For the probability P(1, 7) of absorption, (12) becomes 


(15) > = wee 


The function (x) = xe ~* of the real variable z is zero for x zero, increases to a 
maximum at x = 1 and decreases monotonically to zero; hence the equation 
a — xe* = 0 has two real roots for a < ¢” and in the present instance, Eq. 
(11), because probabilities are in question, the smaller is the proper one. For 
ae < 1, this root is ae itself, otherwise it is denoted by zo . Hence 
P(i, T) = 1, ae 1, 
(16) 
= exp [— (a — 20/e)T), ae> 1. 
It is interesting to notice that the first of these may be verified as follows. 
Rewrite (14) as 


= __\k-1 
Se ae [ £ bo nae |. 


Then, by a result given by Jensen [3], namely 


- en athe) (a 4 ka)" fs 
0 k! 


and (14a), it follows that 


‘ ] 
PQ) T) = 2 + 
ve 1 — ae 1 — ae . 
Jensen’s result may also be used with (14) to show that 
(17) M(T) = >-kp(k, T) = aT(1 — ae)” 


For higher moments, two courses are open. First, since 
Se 


’ 


oo 


(18) PQ + 2,7) = > 2'Mo(T)/k! = MC, T) 


0 





SINGLE SERVER PROBLEM 


with M 4 (7) the kth factorial moment, it follows from (10) that 


T 
(19) M(z,T) = e€*" + afl + x) [ M(z,t + de dt. 
0 


By differentiation 

(20) OM (x, T)/dT = alM(a, T + «-) — M(z, T) + 2M (a, T + ©); 
hence, equating powers of x, with a prime denoting a derivative, 

(21) M'w(T) = aMa(T + ©) — aMa(T) + akMa1(T 4+ 0), 

a differential recurrence relation which may be solved step by step, and which is 
satisfied by Mq)(T) = M(T), where M(T) is given by (17). The next case is 
M'»(T) = aMe(T + ©€) — aMe(T) + 2a°(T + 6)(1 — ac) 

and it turns out that 
Me (T) = aT(ae)(2 — ae)(l — ae)” + M*(T). 
Second, from (12) by Lagrangian inversion (cf [5], l.c. 209) 


9 2 _” 
eu . \n—1 W 
z2=wt + -++ + (n)” 7 at" 


1 


and 
exp (a7' — AT’) 
exp (ruT + 2eT(ru)*/2! + --- + (ne)""T(xu)"/n! + --- 
, Li (xu)"¥a(ys »Y2,°** 5 Yn)/n! 


exp zuY, symbolically, ae &'T, 


ae 


with u = ae “, Yal(y, Ye, °** , Yn) & Multivariable polynomial introduced by 
Bell [1], yn = (ne)"’T, and in the symbolic abbreviation the usual convention: 
Y* = Yiltz, ye, -** , Ye) is followed. (The relation used in the second and third 
lines of (18) may be regarded as a definition of the Y polynomials). 
Then 
M(x, T) = exp [(1 + ax)uY — al], symbolically, 


and again for ae < 1 


Ma (T) e "yu" D* exp uY, D = d/du, 


—aT, kk - 
eu D* exp al 


7 om m2 + k k 
Yi(Tua,, Tuan, --- , Tu ax), at = D*a, 


the second line following from M(0, T) 1, the third from the development in 
[6]. 


The derivatives a, are readily calculated; indeed, from the initial values 


ua a(l — ae)’, u’ ote a(ae)(2 — ae)(1 — ae)” 





476 B. MCMILLAN AND J. RIORDAN 


and mathematical induction, it is found that 
(24) uo, = a(l — v) *a(v), 
with 
gexi(v) = [1 — k + (44 — 2)o — ko'|ae(v) + o(1 — v)q’e(v), 

the prime indicating a derivative. 

It may be noticed that the variance of the number served is given by 

var = Ma(T) + M(T) — M’(T) 
= aT(1 — ae)”. 

3. Random spacing. As before p(0, 7) = ¢ “", and the other probabilities are 

obtained by a recurrence derived as follows. Suppose service on the second item 


is begun when it is in the interval (S, S + dS) in time units away from the 
barrier; the probability of this event is, with a = a8/(a + 8), 


» 
8 as [ e PSH Me dt = alf™ — & *")e* dS, 
0 


T 
’ B(S+t—T) at —aT s -88 ' 
8 as | e PSM ae dt = ae *"(e** — &**) dS, 
Ts 


Hence, just as with (9) 
i 
p(k, T) = ae*" [ (e** — &**)p(k — 1, 8) dS 
(25) ’ ! ~ 
+ ale" ns Pa) [ 8 n(k — 1,8) ds, ez > 6. 
= 
It may be noticed for verifications that 
pv(1, T) = aTe “", 
p(2, T) = a Te "(a + BY + a Te *"/2!. 
The probability generating function P(x, T), defined by (1), has the recurrence 


T 
P(z, T) = &" + aze*? | te e 5) P(z, S) dS 
(26) ; 


4+ ar(&™ — ¢*") / e€ *S P(x, S) dS. 
T 
Trying an exponential solution 
P(t, T) =e", A = A(z; a, B) 


leads to the conditional (quadratic) equation 


(27) (a — rX)\(B + X) = apr, 





SINGLE SERVER PROBLEM 477 


which again agrees with (4) when B(t) is given by (6) and P(x, T’) = exp —AT as 
above. The solution of (27) is 
2\=a—B+[(a+ B) — 4 afc}. 


The positive sign must be chosen since it leads to \(0) = a and P(z, T) Ss 
for « S 1. Hence 


(28) P(2, T) = exp —4 [a - 8+ Va +8)? — Sabi. 
It follows at once (taking the positive square root) that 


P(1, T) = 1, as 8, 


—(a—8)T 
= ger a2zB. 


(29) 


The probabilities p(k, T) can be obtained easily from the generating function 
by noting that its second derivative may be written as 


(30) (a + B)° — 4eBr]P’ (x, T) = 2aBP'(x, T) + (a8T)’P(z, T). 

From this follows the recurrence 

(k + 2)(k + 1)p(k + 2, T) 

wy = (2k + 2)(2k + 1)(a’/a8)p(k + 1, T) + a’T*p(k, T). 


For an explicit expression, write 


k—1 mks 
p(k, T) = &*? Do Any ae b = a°/ab; 
j =O yp 25e 


then the numbers A;; are determined by the generating function recurrence 


(1 — x)A;(x) = (i — x) > Ajj’ 


2k — k 
A; (x) — (P72. 


Similarly factorial ae are ee from the following relation for the 
derivatives of M(x, T) = P(i + a, 


(32) [(a — B) — 4a8c]M"(x, T) = 2a8M'(x, T) + (aBT)*M(z, T), 
which leads to the recurrence 
(33) Mass(T) = (4k + 2)[aB/(a — 8) \Masy(T) + [(aBTY/(a — 8)\Ma)(T). 


Hence 


k—1 
M,(T) =k!>. Ans 7 ast d’ 
j=0 j)! 


with the numbers A,,; as above, and c = a8/(8 — a), d = a8/(B — a)’. 





478 B. MCMILLAN AND J. RIORDAN 


The mean and variance of the number served are 
NR. el 
B—-a 1—a/p’ 

y 8 —a)T aT (i — (a/8)’) 
rar (T) = ap(s — «) _ af — \a/p) 
var ( ) (8 — a)® (1 — a/p)® 


M(T) = 





Note the similarity to the corresponding results for uniform spacing. 

Thanks are due to E. N. Gilbert for a thoroughgoing review of an earlier draft, 
and to the referee for his stimulating identification of two apparently distinct 
problems. 


REFERENCES 
2. T. Bevx, “Exponential polynomials,’’ Ann. Math., Vol. 35 (1934), pp. 258-277. 
5. Borgt, “Sur l’emploi du theoreme de Bernoulli pour faciliter le calcul d’une infinite 
de coefficients. Application au probleme de l’attente a un guichet,’’ C. R. Acad. 
Sci. Paris, Vol. 214 (1942), pp. 452-456. 
[3] J. L. W. V. Jensen, ‘‘Sur une identite d’Abel et sur d’autres formules analogues,”’ 
Acta Math., Vol. 26 (1902), pp. 307-318. 
[4] D. G. Kenpatu, ‘Some problems in the theory of queues,’”’ J. Roy Stat. Soc., Ser. B., 
Vol. 13 (1951), pp. 151-173. 
[5] G. Pétya anp G. Szea6, Aufgaben und Lehrsdtze aus der Analysis, I., New York, 1945. 
[6] J. Rrorpan, “Derivatives of composite functions,’’ Bull. Amer. Math. Soc., Vol. 52 
(1946), pp. 664-667. 
|7] L. Takacs, “Investigation of waiting time problems by reduction to Markov processes,” 
Acta Math. Acad. Sci. HXngaricae, Vol. 6 (1955), pp. 101-129. 





SOME FURTHER METHODS OF CONSTRUCTING REGULAR 
GROUP DIVISIBLE INCOMPLETE BLOCK DESIGNS 


By G. H. Freeman 


East Malling Research Station, Maidstone, England 


1. Summary. Some further methods are given for the construction of regular 
group divisible incomplete block designs, and designs derivable by these meth- 
ods are tabulated. The methods are (i) designs containing complete and incom- 
plete groups; (ii) designs with groups arranged in sets; and (iii) designs de- 
rivable by addition. In the first two of these methods, which are related, it 
may be possible to avoid having to take all the blocks that the full procedure 
would indicate. 


2. Introduction. An incomplete block design with r replicates of v treatments 
on b blocks with k plots in each is said to be group divisible if it contains m 
groups of n treatments each, where mn = v, in such a way that treatments in 
the same group concur in ), blocks and treatments in different groups concur 
in Ae blocks, where A; ~ A2. For all group divisible designs the following rela- 
tionships hold: bk = vr, \y(n — 1) + Am(m — 1) = r(kK —1),r2r,rk2= 
Aw. Further, the efficiency factors are as follows: 
n{dr + (m — 1)r2} _ hp ly 
aT BAT PTOI A Toor 
mn ro{dy + (m — 1)As} Agv 


betwee roups, FE, = ——— $$ = — E;. 
vetween groups, 2 rk {a + (mn — 1)ds} ee Sac 1 


within groups, , = 


Group divisible designs have been classified into three types by Bose and 
Connor {2}: (i) singular designs for which r = ), ; (ii) semi-regular designs for 
which r > Ay, rk = Agu; (iii) regular designs for which r > Ay, rk > dg. It is 
the purpose of this paper to present some unpublished methods for the con- 
struction of regular group divisible designs, and to give examples of the designs 
obtained by these methods. 

Methods for the construction of group divisible incomplete block designs 
have been given by Bose, Shrikhande, and Bhattacharya [3], and tables of 
such designs, inter alia, have been prepared by Bose, Clatworthy, and Shrik- 
hande [1]. The designs derived here are of the following kinds: (i) designs con- 
taining complete and incomplete groups; (ii) designs with groups arranged 
in sets; (iii) designs derivable by addition. The first two of these are related 
in that designs of a very general nature which are generalizations of the first 
kind are also, in one sense, generalizations of the second. As will be seen, these 
very general designs tend to require extremely large numbers of replicates and 
plots per block, and so they are not considered in any great detail below. An- 


Received August 1, 1956. 


479 





480 G. H. FREEMAN 


other feature common to the first two kinds of design is that in both cases 
designs are possible which have only a fraction of the number of blocks and rep- 
licates necessary for the complete design. The third method of deriving de- 
signs, by addition, is unrelated to the other two; it is a slight generalization of 
a method given by Bose, et al. [3]. Apart from a few isolated examples, none 
of the designs given here appears in the tables of Bose, et al. [1]. 


3. Complete and incomplete groups. The first method, that of complete and 
incomplete groups, arises in the following manner. Consider a design with m 
groups of n members each, where mn = v. Let each block of the design contain 
u complete groups (1 S u S m — 1) and h members from one other group 
(1 Sh Sn — 1), the design containing sufficient blocks that the A extra treat- 
ments shall be selected in all prossible ways from each of the (m — u) groups 
not wholly represented. This necessitates "C,(m — u) blocks with the same u 
groups, and thus "C,,"C,(m — u) blocks in all. The complete design thus has 
™IC.."Cx[(nu + h)/n) replicates of mn treatments on "C,,"C,(m — u) blocks with 
(nu + h) plots per block. 

By the method of its construction the design is group divisible, save where 
u = m — 1 and h = n — 1 simultaneously, in which case it degenerates into 
a totally balanced incomplete block design. Ignoring this case, it is seen that 
the other parameters of the first kind are 4, = "“C,{u"C, + "“Cy-2} and 
he = "COC yal(u — 1)"C, + 2” "Cy_1}. Thus the design satisfies the conditions 
r >, rk > dw, where r is the number of replicates and k the number of plots 
per block, and so is regular. 

As an illustration of the construction of the design consider the case m 2, 
u=i1,n = 4,h = 2, giving a design with 9 replicates of 8 treatments on 12 
blocks with 6 plots in each, with 4; = 7, A» = 6. The design is as follows. The 
two groups are ABCD and EFGH and the blocks are given by columns. 


A 2 RR aD as Se Uwe 
B B B , 2. Ts oe 
oS es ome ae Se a eS 
DDD ne EB 2 as 8 

| =» Ss F nb AB a eo © 

G #8 Gf B yw we 


All designs of this type with r < 10 are shown in Table I. 

In certain circumstances it is possible to obtain a regular design with the 
same m, u, n and h without taking as many blocks as would be implied by the 
designs just described. For example, when m = 3, u = 1, designs with just 
half the blocks are possible by taking blocks with the first group of treatments 
complete and the second incomplete, the second complete and the third in- 
complete, and the third complete and the first incomplete. There are then 
"Cil(n + h)/n] replicates of 3n treatments on 3"C, blocks with (n + h) plots 
per block, Ay = "Cyr + "Che, 2 = ” 'Cy-1. As an illustration, consider n 


4, h = 1. The design is as follows, and here r = k = 5,» = 6b = 12,xy = 4, 





INCOMPLETE BLOCK DESIGNS 481 


he = 1. The groups are ABCD, EFGH, and IJKL, and the blocks are given 
by columns. 


A &-a- 4-1 I 

BB BF F Jd ‘ 
C0 0468.8. @ Sate 2s 
PD. Aa L L 
: 1G: Bf 2 BC D 


All designs of this type with r S 10 are given in Table II. 

Half designs are possible on this same principle for any odd value of m when 
u = 1; in such designs those blocks are taken in which one group is complete 
and another incomplete, but not conversely for the same pair of groups. As an 
illustration, m = 5, n = 3, h = 1 gives the following design with 8 replicates 


of 15 treatments on 30 blocks with 4 plots each, Ay = 6, Ax = 1: 
ID DGGGIJI I IMMMAAAGGGMMMODODDJIJIJ 
EHHHKKKNNNBBBHHHNNNEEEKKK 
riETripne.we OSGCOCLDI EL TO Oe ¥.F Pi.oe 
IJ KLMNOABCGHIMNODEF JKLA BC 


DI 
EE 
FF 
) ‘GH 
The groups are ABC, DEF, GHI, JKL, and MNO, and the biocks are given 
by columns. For u = 1 all the possible half designs with 10 or fewer replicates 


are given below: 


h rT v b C A 2 E, E, 
6 10 20 ‘ 0.89 0.68 
s 15 30 j 0.94 0.70 


l 
l 
I 10 20 é 0.96 0.71 
2 
I 


m 


= 


5D 


0 


10 15 30 ‘ ; ; 0.96 0.80 
4 14 42 ‘ ) 0.89 0.65 


oy 


No WS & W bo 


7 


Designs of this nature with u ~ 1 are also sometimes possible, but there is 
only one with 10 or fewer replicates. This is given by m = 5,u = 2,n = 2, 
h = 1, and gives rise to the following design with 10 replicates of 10 treatments 


on 20 blocks with 5 plots each, A, = 8, A» = 4, A; = 0.96, BE. = 0.87: 


AACCEEGGI I 
BBDDFFHHJ J 
EEGGIIAACC 
FrRFHHJJBBDD 
CDEFGHI JAB 


rGIil 
{HJ J 
I A 
J BB 
Y DEF 


1 ¢ 
| 
I 
J 
( 


1 
3 


The groups are AB, CD, EF, GH, and IJ, and the blocks are given by columns. 
A valid design is obtained by taking half the blocks, the first ten as the 
design is written down. The design has 5 replicates of 10 treatments on 10 blocks 
of 5 plots each, \; = 4, AX» = 2, EF; = 0.96, FE, = 0.87. 
Half designs are also possible for other cases where n = 2,h = 1,m = 2u + 





482 G. H. FREEMAN 


TABLE I 
Regular group divisible designs formed by the method of complete and incomplete 
groups 


= 





Oo 


2h Ww Ww W 


Noo Ww 
wm bo bo 


> or bo 











wr WOR WWONOWNOS HE Sw | 





w © 
no 





wily os hae ee 
 — =e ee et 





TABLE II 


Regular group divisible designs with half the blocks for complete 
groups with m = 3,u = 1 


Mi Ae 





| | 
| 
t 
| 
| 
| 


— pat 


| 


ano SP Ww 


| 


e 
i | 


oe  B OW CO DO 
Ne tb 


o 


~I 


ou 
pnd ek et et 








© 





1. Putting u = 3 and 4 respectively gives designs with 7 replicates of 14 treat- 
ments on 14 blocks of 7 plots each, A; = 6, A» = 3, E, = 0.98, EF, = 0.91 and 
9 replicates of 18 treatments on 18 blocks of 9 plots each, \y = 8, Ax = 4, Ay = 
0.99, EZ, = 0.94. 

Further, if there is a balanced incomplete block design of n treatments with 
h plots per block which is not unreduced, or if the unreduced design with these 
parameters is resolvable, a regular design may be possible without taking all 
possible blocks. The only design of this type of practicable size appears to be 





INCOMPLETE BLOCK DESIGNS 483 


that with m = 2,u = 1,n = 7,h = 3, which gives rise to a design with 10 
replicates of 14 treatments on 14 blocks with 10 plots in each as follows: 
A — G with HIJ, HKL, HMN, IKM, ILN, 
JKN, JLM; 
H — N with ABC, ADE, AFG, BDF, BEG, 
CDG, CEF. 
1 = 8, = 6, FE, = 0.98, EF. = 0.96 
Other designs, of a similar type but not necessarily containing any complete 
groups, can be obtained as follows. Let each block contain h; members from the 
jth group of treatments in all possible ways, there being m; groups of treat- 
ments. If 7 goes from 1 to s we have k = ai hym;, and the blocks can be 
divided into sets such that each set contains | |}. "Cy; blocks, where h; occurs 
m; times. The number of such sets of blocks is thus m!/T]j1 m;!, where 


8 
DD ja1 ms; = m. 


The design contains r replicates of mn treatments on (m!] |" Cr;)/L Im; blocks 
with >> hym; plots per block, where summations and products run from 1 to s 
and the term |] "Ch, contains h; m; times. Further, 


j=l 
and 


A: = ty ie "Cry ( Il "Cay.) ’ 


isi’ i?! 3,3" 
where h; occurs m; times. 

Thus, if h; is a constant, the design is semi-regular; otherwise it is regular. 
The only case with r S 10, k > 2 is given by m = 2,n = 3, 8 = 2, m, = m = 
1, hy = 2, he = 1, giving 9 replicates of 6 treatments on 18 blocks with 3 plots 
per block, 4; = 3, A» = 4, EH, = 0.78, FE, = 0.81. The design is: 


AAAA A A BBBODODODODODODEEE 

BRB B.SC. OC. ©.8. £2 Bee esas. oe ese 

DEFODEFODEFABCABCA BC 
The groups are ABC and DEP, and the blocks are given by columns. 

As before, only a fraction of the total number of blocks may be needed and 
two designs derived in this fashion have respectively m = 3, n = 3, s = 3, 
m, = Mm, = ms = 1, ky = 2, he = 1, hs = O, giving a half design with 9 repli- 
cates of 9 treatments on 27 blocks and 3 plots per block, \y = 3, A. = 2, and 
m= 4,n = 4,8 = 3, m = me. = 1, ms = 2, hy = 2, he = 1, hg = O, giving a 
1/6 design with 9 replicates of 16 treatments on 48 blocks with 3 plots per 
block, A: = 2, A.» = 1. Designs with these last two sets of parameters are given 
by Bose, et al. [1], but the blocks comprising the designs are different in 
each case, even though the efficiency factors are unaltered. 


4. Designs with groups arranged in sets. In certain designs the groups of 
treatments may be arranged in sets in such a fashion that treatments from 





484 G. H. FREEMAN 


groups within a set concur in one way while treatments from groups in differ- 
ent sets concur in another way. At first sight this would appear to lead to de- 
signs with three associate-classes, and in general it does, but in many particu- 
lar cases the treatments from different groups concur the same number of 
times whether or not the groups are in the same set In such a case the prop- 
erty of group divisibility ensures that there are only two associate-classes. The 
simplest of these designs, those with 2n plots per block, are derived below. 

Consider a design with 2n’ treatments in 2n groups of n members each, the 
design having 2n plots per block. Divide the groups of treatments into two sets, 
set 1 containing the first n groups and set 2 the remainder. The design then 
has blocks of the following kinds: 

(i) Blocks containing two complete groups from set 1 or set 2, there being 
[n(n — 1)]/2 from each set and thus n(n — 1) in all. There are thus (n — 1) 
replicates of each treatment in these blocks. 

(ii) Blocks containing one complete group from set 1 and one member from 
each group in set 2, or conversely. These blocks are sufficient in number that 
each group from one set occurs once with every treatment from the other and 
that the groups of a set occur equally frequently. This necessitates n’ blocks 
with complete groups from one set and one member from each group of the 
other, and thus 2n’ blocks in all. Further, in order that each treatment shall 
occur once and once only with all treatments in the same set but different 
groups, » must be such that there are (n — 1) orthogonal Latin squares of 
side n. Each treatment is replicated 2n times in these blocks. 

The complete design thus has (3n — 1) replicates and n(3n — 1) blocks. 

In blocks of the first kind, each treatment concurs (n — 1) times with treat- 
ments of its own group, once with treatments of the other groups of its own 
set, and not at all with treatments of the other set. In blocks of the second 
kind, each treatment concurs n times with treatments of its own group, once 
with treatments of the other groups in its own set, and twice with treatments 
of the other set. Thus the design is group divisible with A; = 2n — 1, A» = 2, 
and so, further, is regular. 

n = 2 and n = 3 give the only examples with 10 or fewer replicates, these 
having respectively 5 replicates of 8 treatments on 10 blocks with 4 plots each, 
u = 3, £, = 0.90, EF. = 0.85, and 8 replicates of 18 treatments on 24 blocks 
with 6 plots each, \; = 5, FE; = 0.94, EF, = 0.87. The design for n = 3 is: 


AADJ JIM AAADDDGGGJJIIJIMMMPPP 

BBEK KN BBBEEEHHHKKKNNNQQQ 

CCFLLO CCCFFFIIILLLOOORRR 

DGGMPP JI KLJIKLJIKLABCABCABC 

EHHN QQ MNONOMOMNDEFEFODFDE 

FIITORR PQRRPQQRPGHIiIIGHHAIG 
Blocks of the first kind Blocks of the second kind 


The groups are ABC, DEF, GHI, JKL, MNO, and PQR, and the blocks are 


given by columns. 





INCOMPLETE BLOCK DESIGNS 485 


If there are 3n groups of n members each instead of 2n groups, a design with 
2n plots per block is possible in a similar fashion. Here, however, only the second 
kind of block described above is used, and so the design is, in a sense, a com- 
plete and incomplete group design as understood in the last section. The three 
possible pairs of sets of groups all have to be considered, thus giving rise to a 
design with 6n” blocks and 4n replicates. If two treatments belong to differ- 
ent groups, whether of the same set or not, they concur twice, i.e. A. = 2; 
further, ), 2n, and the design is thus regular group divisible. The design 


with n = 2 is the only one with 10 replicates or fewer, and has in fact 8 
replicates of 12 treatments on 24 blocks with 4 plots each, A; = 4, A» = 2, A 
= 0.88, EF, = 0.81. The design is: 


AACCEEGGAACCIIKKEEGGII KK 
B 
i 
( 


BDDFFHHBBDDJJLLFFHHJIJJLL 
FEFABABIJIJABABIJIJEFEF 
HHGCDDCKLLKCDDCKLLKGHHG 


The groups are AB, CD, EF, GH, IJ, and KL, and the blocks are given by 
columns, 


‘ 


In the same way that, for complete and incomplete groups half designs are 
possible with m = 3, u = 1, so also are half designs possible here by means of 
the same device, i.e., with complete groups from the first set and incomplete 
from the second, and so on in a cyclic fashion only. n = 2 gives a design with 
4 replicates of 12 treatments on 12 blocks with 4 plots each, and this design is 
given by Bose, et al. [1]. In general the design has 2n replicates of 3n” treat- 
ments on 3n° blocks with 2n plots each, \, = n, x = 1. For n = 3, 4, or 5 re- 
spectively, the design thus has 6, 8, or 10 replicates and plots per block, and 
it has 27, 48, or 75 treatments and blocks; A, = 3, 4, or 5; EZ, = 0.92, 0.94, or 
0.95; E, = 0.85, 0.88, or 0.90. 

Designs of this kind arc possible with more than 2n plots per block, but the 
numbers of replicates and plots per block very soon give designs which are 
beyond the bounds of practicality. Even the smallest designs with 3n’ treat- 
ments and 3n plots per block have 12 replicates. However, the smallest design 
with 2n’ treatments and 3n plots per block, that with n = 3, gives a more prac- 
tical design. This design, which has 9 replicates of 18 treatments on 18 blocks 
with 9 plots each, A; = 6, \» = 3, BE, = 0.96, E. = 0.91, arises in each of the 
following forms: 


A A A 
B B 
or 6: +¢ 
D G 


— 


M 
N 
O 
P 
Q 
R 
A 
FP 
H 


CN ee ee et 
ma es Ee 


oa 
— 


Aa 





- H. FREEMAN 


D A A P P MMM J J 
E B B Q2 Q N N K K 
; cc R R L L 
K x2 AA AABAOA 
L oe Cc BC Cc CB 
M N i Db’ D dD t D E D 
O N O , 2s ye 2 Se Us FF 
P ? GGGH G GH 
QR R Q m_— as s . eT 


The groups are ABC, DEF, GHI, JKL, MNO, and PQR, and the blocks are 
given by columns. 

These two designs illustrate the principles on which the general designs of 
this type are derived, viz., either two complete groups from one set and one 
member from each group of the other set or one complete group from one set 
and two members from each group of the other set. 


QvuAeAcemo 


R 


5. Designs derivable by addition. Bose, ef al. [3] describe designs de- 
rivable by addition of further blocks to a balanced incomplete block design with 


TABLE III 
Regular group divisible designs derivable by addition 


by Type | Rep. | rz | 


| be | Type Rep. | ¢ b A | 
| 
a 
10 
10 


DOAN II SHAK | 3 





).83 | 


12 
14 


bt be 


10 
9 | 18 | 
7| 14 
9 | 18 | 
s 
5 | 
i 9 
| 10 
10 
s 
| 9 
10 | 
10 
| 10 
| 10 
10 


10 
10 76 81 
.74 | 0.81 
.83 
88 | 0.84 
89 | 0.82 
.90 | 0.80 
.90 | 0.85 
.94 | 0.83 
83 | 0.86 
0.95 | 0.93 
0.88 | 0.76 
0.90 | 0.82 
0.90 | 0.88 
0.80 | 0.64 


10 | 


>» & dH 


1 

1 

1 

1 

1 

1 

12 1 
14 | 1 
14 1 
| 1 

1 

1 

1 

1 

1 

1 

1 

1 


=e DS Se Oh | 








14 
14 
10 | 
12 
12 | 
| 15 
15 
/9 | 18 | 
| 4 | 16 | 





a w tw bt 
MRM MN 








~~ eh eee Pe ww HH WW 


8 | 
| 
9 | 
10 | 
0 | 
| 


10 


oO 
TM 
orwr ww WwW WK WW DW W DH 


wNwWrRr RRP NNN WWWHW NWN WD 





anwortnonuat’ *& NN D> Ore Ww 


oo 
~~ 


or 


TR 





TT M 
ee ee ee 


12/3/3 


ot, > 


5 


eBHNaWNHNHN FF KF NN ND WO WW W 
TR 





woererekehr NW WWWerR KR SE NNN eK Re 


be 
— we CoO bo 


~~ oO 


12 | R 


r: , b} and rz, 6b: are the numbers of replicates and blocks in the two designs which 
are added to make the final design given here, the number of complete replications of 
these initial designs being given in the appropriate column. The type of initial design 
is also given, T for totally balanced incomplete block designs and 8, SR, and R respec 
tively for singular, semi-regular, and regular group divisible incomplete block designs. 





INCOMPLETE BLOCK DESIGNS 487 


v = mk, m blocks giving a complete replicate, and designs derivable by taking 
together the blocks of two group divisible designs with the same v and k. How- 
ever, it is possible to add blocks to a balanced design even with »v # mk, or v = 
mk when m of the balanced design blocks do not give a complete replicate, or to 
add a group divisible design to a balanced design, all of which amount to the 
same thing. New designs (r < 10) derived by these and other addition methods, 
are given in Table III. An example of this type with »v = mk is given by the fol- 
lowing design with 7 replicates of 6 treatments in 2 groups of 3 on 14 blocks with 


3 plots each, 4, = 4, \» = 2. The groups are ABC, and DEF, and the blocks are 
given by columns. 


RAA'RS 2 OD SS Soe A DA 
BBCDECODED D B E B 
CO Oo?" aoe. a C F C F 
Totally balanced incomplete block | Partially 
balanced 
(singular, disconnected) 


An example with v + mk is the design with 10 replicates of 10 treatments in 
2 groups of 5 on 25 blocks with 4 plots each, \; = 5, A. = 2. The design is: 


ABAAABCDDABCA BCAAAABFFFFG 
BCCBDEDEFEFFF DEBBBCCGGGHH 
CEGIEGI HGGHGH GHCCDDDHHIII 
DFHI FILS Li J¢.4 1.) HJ DEEEEIJJJJ 
Totally balanced incomplete block | Partially balanced 
(disconnected ) 


The groups are ABCDE and FGHIJ, and the columns represent the blocks. 
This design illustrates the point made by Bose, et al. [3] that it does not mat- 
ter if a design is disconnected if it is added to another design to form a new one. 


REFERENCES 


{l] R. C. Boss, W. H. Ciatworrny, anp S§. 8. SurrkHanpe, ‘Tables of partially balanced 
designs with two associate classes,’’ North Carolina Agricultural Experimental 
Station Technical Bulletin, No. 107 (1954). 

[2] R. C. Bosz anp W. 8. Connor, “Combinatorial properties of group divisible incom- 
plete block designs,’”’ Ann. Math. Stat., Vol. 23 (1952), pp. 367-383. 

[3] R. C. Bosg, 8. 8. Surrkwanpe, anp K. N. Baarracnarya, “On the construction of 


group divisible incomplete block designs,’’ Ann. Math. Stat., Vol. 24 (1953), 
pp. 167-195. 





ON PARTIALLY BALANCED LINKED BLOCK DESIGNS 


By J. Roy ann R. G. Lana 
Indian Statistical Institute 


1. Summary. The computations in the analysis of any equireplicate design 
can be carried out very easily if the number of treatments common to any two 
blocks is constant. A design with this property is called a Linked Block (LB) 
design and was introduced by Youden [9]. It is well known that for a Balanced 
Incomplete Block (BIB) design to have a constant number of treatments in 
common between any two blocks, it is necessary and sufficient that it is sym- 
metric, that is, the number of blocks is equal to the number of treatments. 
In this paper, necessary and sufficient conditions are derived for any design 
with a given treatment-structure matrix to be of the LB type and the results 
applied to Partially Balanced Incomplete Block (PBIB) designs. Finally a 
list is prepared of all LB designs in the class of two-associate PBIB designs 
enumerated by Bose, Shrikhande and Clatworthy [2]. 


2. Introduction. An arrangement of v treatments in 6 blocks, each of k plots, 
k < v, such that each treatment occurs at most once in any block and altogether 
in r blocks is called an incomplete block design and denoted by D(»v, b, k, r). 
Obviously vr = bk. A design with b = v is called symmetric. A D(v, b, k, r) 
is completely characterised by its ‘incidence matrix’ N = ((n;;)) where n;; = 1 
if the ith treatment occurs in the jth block and n;; = 0 otherwise i = 1, 2, --- , 
v,j = 1,2, --- ,b. The matrix A = NN’ where \y = rand );; = the number of 
blocks in which the treatments 7 and 7 occur together i # 7 = 1, 2,---,vis 
called the ‘treatment-structure matrix’ of the design. A design is balanced if \;; = 
\ for all i + 7. The design obtained from D(v, b, k, r) by considering its blocks 
as treatments and treatments as blocks is called its dual. The number of treatments 
common to the ith and the jth blocks will be denoted by u;; , 7,7 = 1, 2, --- , b. 
The matrix M = N’N has been called the ‘structural matrix’ by Connor [5] 
and Connor and Hall [7]. We shall however call M the ‘block-structure matrix’. 
A design is of the LB type if u:; = uw for all i # 7. Obviously a design is of the 
LB type if and only if its dual is balanced. For a definition of PBIB design the 
reader is referred to Bose and Shimamoto [4]. 


3. Conditions for a design with given treatment-structure matrix to be of 
the LB type. We first give two lemmas which are useful in deriving necessary 
and sufficient requirements on the treatment-structure matrix. 

Lemma 3.1. /f A and B are two matrices of the form m XK n and n X m respec- 
tively, the non-zero latent roots of AB are identical with those of BA and if cor- 


responding to a latent root 0, is a latent vector of AB,» = £A will be a latent vector 
Received December 12, 1955; revised October 10, 1956 


488 





LINKED BLOCK DESIGNS 489 


of BA corresponding to the same root 6. If 6 is a non-zero repeated latent root of AB 
of multiplicity r, it is so for BA also. 

Proor. If @ is a non-zero latent root of AB of multiplicity r, we can always 
find r linearly independent vectors £; satisfying §£:AB = 6&; (¢ = 1, 2,---, 1) 
Post-multiplication by A gives 7;BA = 6n, where n; = §A. That »,’s are linearly 
independent follows from the linear independence of £;’s because for any set of 
constants c¢,’s, > cnB = 6>> c.£;. This also shows that @ is also a latent root 
of BA of multiplicity r and n,’s are a corresponding set of linearly independent 
latent vectors. 

LemMa 3.2. The necessary and sufficient condition for a symmetric matrix A of 
order n to have all its diagonal elements equal and all its off-diagonal elements equal 
is that it has only two latent roots, one of multiplicity (n — 1) and the vector (1, 
1, --+ , 1) ts a latent vector corresponding to the other latent root. 

Proor. Necessity is obvious. To prove that the conditions are sufficient, let 
us write 


2 Seah) 


Since A is symmetric there exists an orthogonal matrix 


[3] 


such that 


’ is 6; 0 
¢ ac i 62 , . 


where 6, and 6, are the latent roots, 6. with multiplicity (n — 1) and J, is the 
identity matrix of order n. Premultiplying by C’ and post-multiplying by C we 
get 


A = 0,0’ a + 6.P’P = (6; - 6s)’ ce + bol n ’ 


which has diagonal elements equal to {@ + (nm — 1)@}/n and off-diagonal 
elements equal to (6; — 62)/n. 

We are now in a position to prove 

THEOREM 3.1. The necessary and sufficient condition for a design D(v, b, k, r) 
to be of the LB type is that k — yu is a latent root of the treatment-structure matrix 
A, of multiplicity (6 — 1) where wp = k(r — 1)/(b — 1). 

Proor. The necessity is obvious. To prove the sufficiency of the conditions, 
let us write N for the incidence matrix of the given design. Then we have to show 
that the block-structure matrix M = N’N has all off-diagonal elements equal. 
Since it is given that k — yu is a latent root of multiplicity (b — 1) of the treat- 
ment-structure matrix A = NN’ by Lemma 3.1 it will be so for M = N’N also. 
Again since the total of each column of A is rk, « = (1, 1, --- , 1) is a latent vec- 
tor of A corresponding to the latent root rk. Again by Lemma 3.1, «€N = (k, k, 





490 J. ROY AND R. G. LAHA 


--+ | k) is a latent vector of M corresponding to the latent root rk, therefore so 
is also the vector (1, 1, --- , 1). Thus M satisfies all the conditions of Lemma 
3.2. Hence it has all diagonal elements equal to k and all off-diagonal elements 
equal to u. But since » = > 1-4 nm; for all i + j and n,; = 1 or 0, « must be 
integral. The number of treatments common to any two blocks is thus yu. 

Corouuary 3.1. If the treatment-structure matrix of any design D(v, b, k, r) 
has only two non zero latent roots, rk and k(b — r)/(b — 1) and rk is not a repeated 
root, then the design must be of the LB type. 

Proor. If ¢ is the multiplicity of the root k(b — r)/(b — 1) equating the sum 
of the diagonal elements of the treatment-structure matrix to the sum of the la- 
tent roots, we get t = (b — 1). Hence the result. 

Corouiary 3.2. If D(v, b, k, r) is balanced, the necessary and sufficient condition 
that it is of the LB type is that v = b. 

Proor: Since the design is balanced, its treatment-structure matrix has a latent 
root of multiplicity » — 1, while if it is of the LB type the multiplicity must be 
b — 1. Hence b = ». 


4. Partially balanced linked block designs. We now apply the results of 
Section 3 to the special case of PBIB design D(v, b, k, r) having m associate 
classes with parameters n; , \; , pj.(t, j, 8 = 1, 2, --- , m) as defined in Bose and 
Shimamoto [4]. It follows from the results of Connor and Clatworthy [6] that 
latent roots other than rk of the treatment-structure matrix for such a design 
are, except for repetitions, the same as the latent roots of the reduced matrix 
A = ((a;;)) of order m where 


(ai; =-=rf+ MPin + Api2 ++ + \mPim — AN, 
(4.1) {ai = pir + Apia +--+ + AmPim — AM, 
t#j=1,2,---m. 


Hence we have the 

THEOREM 4.1. The necessary and sufficient condition for a PBIB design with m 
associate classes to be of the LB type is that the matrix A defined in (4.1) has only 
one non zero latent root k(b — r)/(b — 1). 

Coro.uuary 4.1. The necessary and sufficient condition for a PBIB design with 
two associate classes to be of the LB type is that 


(4.2) A022 — Ande = 0, 
(4.3) Qu + dn = k(b — r)/(b — 1), 
where 
= r+ Mp + Apis — um, 
Mpir + Aspiz — Am , 
Mpa + Aepr2 — AvNe , 
r+ Apa + Aspe — AoMe . 





LINKED BLOCK DESIGNS 


Proor. The matrix ™ _ cannot have two equal latent roots. 


aa 
We shall now apply this result to some special types of PBIB designs with 
two associate classes. 
4.1 Group Divisible (GD) Designs. A GD design as defined by Bose and Con- 
nor [1] is specified by the parameters 


v = mn, m = (n — 1), Ne = m(n — 1), 


j n—2 0 oy, _ 0 n—1 
(ph) = l > aa al (ph) = x eis of 


They have classified the GD designs as (i) Singular when r = ), (ii) Semi- 
Regular when r > ), and rk = vd, and (iii) Regular when r > d; and rk > vr.. 
They have also shown that for a Singular GD design, b 2 m and for a Semi- 
Regular GD design b 2 v — m + 1. In order that a GD design may be of the 
LB type the condition (4.2) gives 


(r — dx) (rk — vdA2) = 0, 


so that a Regular GD design is never of the LB type. The condition (4.3) gives 
for the case r = A, b = m and for the case rk = vy, b =v — m+ 1. We 
now summarise these results in the form of the 

THeoreEM 4.2. A Regular GD design cannot be of the LB type. The necessary 
and sufficient condition for a Singular GD to be of the LB type is that b = m and 
that for a Semi-Regular GD is that b = v — m + 1. 

Bose and Connor [1] have shown that a Singular GD design can always be 
derived from a BIB design with m treatments by replacing each treatment by a 
group of n treatments. Hence the condition for a Singular GD design to be of 
the LB type is that the BIB design from which it is generated should be sym- 
metric. 


4.2 Triangular Designs. In a Triangular design (Bose and Shimamoto [4]) 
= gn(n Sa 1), — 2(n ig 2), n= 3(n Spee 2) (n wed 3), 


1 n—2 n—3 $y, _ 4 2(n — 4) 
((pjs)) = [" = An - 3)(n — ah ((pje)) ~ a — 4) A(n es 4)(n some mi 


In order that a Triangular design may be of the LB type the condition (4.2) 
gives 
(r — 2, + de) {r + (n ~- 4)yy * (n ~~ 3)A2} = 0. 

From the other condition (4.3) we get b = n if r — 2A. + A. = O and b = 
k(n — 1) (n — 2) if r+ (nm — 4)\. — (nm — 3)A2 = O. We thus get the 

THEOREM 4.3. The necessary and sufficient condition for a Triangular design to 
be of the LB type is that either (i) r = 21 — de and b = n or (ii) r = (n — 3)ry 
— (n — 4)d; and b = 3(n — 1)(n — 2). 

It is interesting to note that in case (i) if r = 2 we get the Triangular Singly 
Linked Block (TSLB) designs and in case (ii) if r = n — 2 the Triangular 
Doubly Linked Block (TDLB) designs as defined by Bose and Shimamoto 





492 J. ROY AND R. G. LAHA 


[4]. The results of Theorem 4.3 may thus be considered as generating the class 
of Triangular Multiply Linked Block designs. 

4.3 Two associate PBIB designs with k > r = 2 and \; = 1, Ax = O (Simple 
PBIB). Bose and Clatworthy [3] have shown that all designs of this class are 
characterized by the parameters 


=k(r-1(k-1)+d/t, b=r{(r —1)(k-—1)4+ d/t, 
nm, = r(k — 1), nm. = (r — 1)(k — 1)(k — 2)/t 


(( es ee eae (r — 1)(k — 2) 
Pie) (r — 1)(k — 1) (r — 1)(k — t(k —t — 1)/t 


((p3) i rik —t — 1) | 

Pe) =| Gt 1) ((r — Ik — Ik — 28) + rt — B)/t 
where 1 S ¢ Sr. It is interesting to note that in this case the condition for an 
LB design i is t = r and then » = 1. Hence we have the 

THEOREM 4.4. The only LB designs in the class of two-associate PBIB designs 
with k > r = 2 and ; = 1, Ax = O are those which are duals of BIB designs in 
which any two treatments occur together in just one block. 

Shrikhande [8] showed that the dual of any BIB design with \ = 1 is a two 
associated PBIB design with \, = 1, and A, = 0. Our result shows that no two- 
associate PBIB design with \; = 1 and 4, = 0 and k > r 2 2 can be obtained 
by dualising BIB designs other than those with A = 1. 

4.4 List of two-associate PBIB designs of the LB type. We give below a list of 
LB designs in the class of two associate PBIB designs enumerated by Bose, 
Clatworthy and Shrikhande [2]. The reference number for a design is the one 
used by the above authors and » denotes other number of treatments common to 
any two blocks. 


TABLE 4.1 


List of two-associate PBIB designs of the LB type. (S = Singular GD, 
SR = Semi- Regular GD, Sl = = = Same, T= = Triangular.) 


Reference No. B Reference No. | “ Reference No. 





™M 


1 


12 
1s 


99 


24 
28 


SR 1 
SR 

SR 26 
SR 32 
SR 5! 


1 Sl 25 
1 ‘ 
5 
5 
] 
SR 7 1 
] 
] 
1 
1 
] 
1 
1 
l 


MNPNRN 
— > CO - bo 


9} 


oO 


SR 85 
SR 

SI 4 
Sl 9 
Sl 17 
Sl 18 
SI 

Sl 22 


T™ TM 


41 


‘a 


DNR RN 


81 


Norwood & 


TR 


I 
I 
I 
I 
I 
I 
[ 
= 
T 
I 
[ 
I 
I 


TN 





LINKED BLOCK DESIGNS 


REFERENCES 

(1] Bosz, R. C., anp W. 8. Connor, ‘‘Combinatorial properties of group divisible incom- 
plete block designs,’’ Ann. Math. Stat., Vol. 23 (1952), pp. 367-382. 

[2] Bosz, R. C., W. H. CLatworrsy, anv 8S. 8. SuHrrkHanpe, Tables of Partially Balanced 
Designs with Two-associate Classes, Technical Bulletin No. 107, North Carolina 
Agricultural Experiment Station (1954). 

[3] Bosz, R. C., anp W. H. Ciatworrtuy, ‘Some classes of partially balanced designs,”’ 
Ann. Math. Stat., Vol. 26 (1955), pp. 212-232. 

[4] Bosz, R. C., anp T. Sarmamorto, “Classification and analysis of partially balanced 
incomplete block designs with two associate classes,’’ J. Amer. Stat. Asen., 
Vol. 47 (1952), pp. 151-184. 

[5] Connor, W. S., “On the structure of balanced incomplete block designs,’’ Ann. Math. 
Stat., Vol. 23 (1953), pp. 57-71. 

(6] Connor, W. 8., anp W. H. CLatworruy, ‘‘Some theorems for partially balanced de- 
signs,’”’ Ann. Math. Stat., Vol. 25 (1954), pp. 100-112. 

[7] Connor, W. 8., anp M. Hatt, ‘“‘An embedding theorem for balanced incomplete block 
designs,’’ Canadian J. Math., Vol. 6 (1953), pp. 35-41. 

[8] SuHrikHanpg, 8. S., “On the dual of some balanced incomplete block designs,’’ Bio- 
metrics, Vol. 8 (1952), pp. 66-72. 

({9] Youpen, W. G., “Linked blocks: a new class of incomplete block designs’”’ (Abstract), 
Biometrics, Vol. 7 (1951), p. 124 





NOTES 


ON BOREL FIELDS OVER FINITE SETS 
By G. Szekeres AND F. E. Brner 


University of Adelaide and University of Melbourne 


1. Summary. It is shown that the number of Borel Fields over a set (S) of 
n elements is equal to the number of equivalence relations within S. This num- 
ber is asymptotically equal to 


(6 + 1)" exp {n(@-—1+,")—1} where fSexpB =n. 


2. Enumeration of Borel Fields over a finite set. Borel Fields are usually 
(e.g. Wald [8]) defined over a set of non-enumerably infinite elements: with 
quite trivial changes, the definition is applicable to finite sets, as follows: 

Let A, B, C, --- denote distinct subsets of a set S of n elements. 86 = {A, B, 
C, --+} is called a Borel Field (BF) if and only if 

(i) B is not empty; 

(ii) AeB, Be®B imply 

AnBeS, Au BeB, S — AcB. 

It follows from the definition that a BF contains at least the empty set (2) 
and S, and is closed with respect to the formation of unions, intersections, and 
complements. 

To enumerate the BF’s, consider the subset $ consisting of all P,,e8(m = 
1, 2, --- , r; for some r = 1, 2, --- , n) such that 


(1) P # &, 
(2) A # @, A #P, AcB implies ACP; 


in others words, no P contains an element of 8 as a proper subset. It follows that 
(3) Pan Py = OD (for m # m’) 


and 


(4) UaP. = S. 


If (3) were not true, the intersection, itself being an element of the BF and also 
a proper subset of a P, would involve a contradiction with (2); if (4) were not so, 
the complement of this union, being an element (other than @) of the BF and 
therefore not containing a subset of any other P, would itself be a P, namely 
P,4;, contradicting the definition of P = {P,,}. 

It is obvious that a BF defines a unique 3; conversely a $ defines a unique BF 
as follows: 


B= {G;Pi,P2,---, Pr; (2) elements like P; U P; ; 
Received June 14, 1956. 
494 





A BOREL FIELD 


(3) elements like P; U P, U Ps; --- ; S}. 


Thus every BF consists of 2” elements, the number of BF’s with 2’ elements 
being the same as that of $’s with r elements. This latter, however, is known to 
be A’0"/r!, where A’0” is the leading rth difference of nth powers of the non- 
negative integers. 

It is obvious from the foregoing that the total number of BF’s over S is the 
same as the total number of $’s, namely 


>, A’0"/r! = Ga, 

r=) 
say; it is also equal to the number of equivalence relations within S. It is well 
known that 


n 


(6) > 2"G,/n! = exp (e* — 1); 


n=O 


in conventional symbolic notation Gay, = (1 + G)”. Bell (2] gives this re- 
currence relation as well as several realizations of G, . We give two further simple 
realizations: 


First, (6) shows that G, is the nth power-moment, around zero, of the Poisson 
distribution with unit parameter, 


Pr(X = x) = (ex!)", i 


Second, (see, for example, Fisher [4]), 


vend 


(7) A’0"/r! = zn II (Death 4 


where summation takes place over all R, v, k, such that 


R 


(8) 7. vk, = n, 


vem] 


R 


(9) Dk =F. 

v= 
The typical term is the number of ways n elements can be distributed corre- 
sponding to the partition of n, symbolically represented by 


k 
yg. ghee RM 


with v, k, satisfying (8) and (9). Dropping the restriction due to (9), but keeping 
that due to (8), the sum becomes G, . 


3. Evaluation of G,. For n = 1 to 20, Epstein [3] tabulates G, , using (5). 
He also gives an asymptotic evaluation of G, , expressed in terms of the function 
W(x) = d/dx log T(r) and the numbers a, defined through the relation 


aWv(a, +1) = n. 


1 For n = 21 to 51 an unpublished table has been prepared by Francis L. Miksa, 613 
Spring Street, Aurora, Il., U.S.A 





496 G. SZEKERES AND F. E. BINET 


We shall give here a more direct asymptotic expression for G, in terms of ele- 
mentary functions; it is obtained by evaluating 


(10) Z. $ g ty exp (e’) dz, 
e 


where C is a simple contour enclosing the origin of the z-plane. Clearly by (6) 
and by Cauchy’s theorem, 


(11) G, =-—1 
2rie 
To obtain an asymptotic expression for J, , we specify C in (10) by | z 
with 8 = B(n) defined by 


(12) Be” n; 


then C intersects the positive real axis very nearly at a point where the derivative 
of the integrand vanishes, and the integral can be evaluated by the method of 
steepest descent. By a modification of Watson’s Lemma (see Jeffreys [6)) it can 
be shown (details are given in the Appendix) that 


G, = nl exp (nd — 1)8*{2en(6 + 1)}-” 
(13) | 
X{1 — (26 + 98° + 168" + 68 + 2)(24n)"(8 + 1) + O(6'n™)}; 
or using Stirling’s formula this simplifies to 
G, = (6+ 1)” exp in(gB —1+8 ') = il 
(14) X{1 — 6(267 + 76 + 10)(24n)(6 + 1)7* + O(¢*n™)} 


(15) = (8 + 1)” exp {n(B — 1 + 6) — 1}{1 + O(8n”)}. 


These are the required asymptotic formulae. It should be mentioned that (15) 
can also be obtained from Epstein’s result, with the help of Stirling’s formula; 
but (14) would require the knowledge of Epstein’s second asymptotic term which 
has not been determined explicitly in his paper. 

The following table gives comparative values of log Gs; as computed from the 
various asymptotic formulae: 


log Gs: (true value) 111.707033 
from (14) 111.707084 
from (15) 111.712500 
from Epstein 111.706867 


The true value was obtained from Miksa’s value for Gs; (l.c. footnote 1). 
By a similar method as above it can be shown that for r < n/log n 


0 ar Pol (320) x4 0(2). 


n/) 





A BOREL FIELD 


This sharpens Jordan’s result [7] 
lim r-"A’0” = 1, 


and establishes a connection between (5) and the known formula (see, for ex- 
ample Bell, [2}) 


a 
G.=¢" 2: r’/r\. 
rl 


Other asymptotic formulae for A’0” have been obtained previously by Hsu [5] 
and by Arfwedson [1], the former being valid when n — r = O(n”), the latter 
when r = Kn, for any constant K < 1. 


4. Appendix. From (10) we get (with z = Be) 


I,=1 | 8” exp{—nig + exp(Be*)} de 
(Al) = 


+6 ® e—s . 
ip” exp?) [ + I + / exp(—nig + exp(Se"*) -2)} dg, 
where 0 < 6 S x. We can choose 
(A2) =n, 


Then we have, for 6 S ¢ S z, 


| exp {—nig + exp (Be'v) — e} |S exp (fos a) é) 


< exp {—46e°(1 — cos 5)} 


‘é exp { —cn*} 


, 


for a suitably chosen constant c > 0. Hence 


(A3) / | < m exp {—cn'”} 
18 


and similarly 
| —$ 
(A4) | <r exp { —en"*} 
in (Al). 
For —é6 S ¢ S 6 the integrand in (A1) can be rewritten 

exp {—nig + exp (Se) — e*} 
exp {—nig + exp (8 + ify — 48¢° — tise’ + O(6¢')) — &*} 
exp {—nig + (1 + ie — 36¢" — dpe’ — hip” 


— pip ’y’ + 0(8%¢)) — &*} 





G. SZEKERES AND F. E. BINET 


= exp {—43ng*(1 + 8)} X {1 — Jin(1 + 38 + Be 
(A5) + O(n’p*p° + np*e')} 


by (12), where the 0-notation refers to n — ©. Use has been made of nf’¢" 
being small when | ¢| S 4 and n is large; this follows from (12) and (A2). 

The second term in (A5) is an odd function of ¢, therefore its integral from 
—é5 to +6 vanishes and we get 


[= [exp tae + 0) ae 


+0 (f (n’B'e® + np*¢*) exp {—4ng"(1 + 8)} ae) 


x 
—k 


= (§n(1 + 8)" [ n™ dv + 0(6'*n-*”), 


where k = 6(3n(1 + 8))"*. Now 
| ec’ dv< | ve” dv = 36" = 4 exp {—4n(1 + 8)8"} < 4 exp (—43n'” 
k K 


° . . ° . - —y? 
and a similar inequality holds for ‘Ze dv. Therefore replacement of the 
limits +k by +@ in (A6) causes au error not exceeding exp (—4"”), and we 


6 
[, = @x/na + 8))"* + O(0"*n-*") 


(A7) = (24 /n(1 + 8))'*{1 + 0(6/n)}. 


Summarizing (10), (11), (Al), (A3), (A4), and (A7), the leading term of (13) 
is obtained. The term with 0(8/n) (and if necessary, any further terms in the 


asymptotic expansion) can be obtained by carrying further the expansion under 
(A5). 


5. Acknowledgment. We thank J. Riordan for bringing to our attention the 
isomorphism of distributions and equivalence relations, the availability of the 
unpublished table mentioned in footnote 1, and for the value of Gs: . 


REFERENCES 
[1] Arrwepson, G., Skand. Aktuarietids., Vol. 36 (1951), pp. 121-132 
[2] Bex, E. T., Ann. Math., Vol. 39 (1938), pp. 539-557. 
[3] Epstein, L. F., J. Math. Physics, Vol. 18 (1939), pp. 153-173. 
[4] Fisuer, R. A., Philos. Trans. Roy. Soc. London, Ser. A, Vol. 222 (1922), pp. 309-369. 
[5] Hsu, L. C., Ann. Math. Stat., Vol. 19 (1948), pp. 273-277. 
[6] Jerrreys, H., anv B. 8. Jerrreys, Methods of Mathematical Physics, University Press, 
Cambridge, 1946, p. 472. 
[7] Jonpan, C., Téhoku Math J., Vol. 37 (1933), pp. 254-278; Vol. 38 (1953), p. 481. 
[8] Waxp, A., Statistical Decision Functions, John Wiley & Sons, New York, 1950. 





TRANSIENT MARKOV CHAINS 499 


ON TRANSIENT MARKOV CHAINS WITH APPLICATION TO THE 
UNIQUENESS PROBLEM FOR MARKOV PROCESSES' 


By Leo BREIMAN 
University of California, Berkeley 


1. Summary. We focus our attention herein on a Markov chain 2, 2%, --- 
with a countable number of states indexed by a subset I of the integers and with 
stationary transition probabilities p;; , and explore the sets of states defined by: 

A transient set of states C is said to be denumerably atomic if P(x, ¢ Ci.o.) > 0 
and if for every infinite set A C C we have z, ¢ C i.o. implies z, ¢ A i.o. with 
probability one (a.s.). 

Following Blackwell’s basic paper [1] which introduced the systematic use of 
martingales into the study of Markov chains, we use the semi-martingale con- 
vergence theorem [2] to characterize denumerably atomic sets in terms of the 
bounded solutions of the inequality 


o(i) S Doser Di(); iel. 


For chains whose state space contains a denumerably atomic set a convergence 
criterion for certain sums }~%~of(zn) is then developed. The application of this 
criterion to a restricted class of continuous parameter Markov processes gives 
simple necessary and sufficient conditions for the existence of a unique process 
satisfying given infinitesimal conditions. This last result illuminates the con- 
nection between the necessary and sufficient conditions given by Feller [3] for 
uniqueness and the simpler conditions for birth and death processes given 
recently by Dobrusin [4], more recently by Karlin and McGregor [5], and by 
Reuter and Lederman [6] (see also [7]). 


2. Characterization theorem. 

THEOREM 1. The necessary and sufficient condition for a transient set of states C 
such that P(x, ¢ C i.o.) > 0 to be denumerably atomic is that any bounded solution 
b(t) of 


(A) $(i) S Diet Di) 


satisfy lim infjec $(4) = lim supiec G(7Z). 

Proor. Let C be denumerably atomic and ¢(7) any bounded solution of (A). 
Then E(¢(2n) | tnt, °** , %) 2 O(2n-1) so that by the semi-martingale con- 
vergence theorem ¢(z,) converges a.s. to a function f(w). Let A; , A: be infinite 
subsets of C such that for 7 ¢ Ai, ¢(7) < a and fori ¢ Az, ¢(t) > 8B > a. Then, 
since almost every sample path which is in A; i.o. is in Ag i.o. the limit of ¢(z,) 
cannot exist. 


Received July 2, 1956. 
1 This paper was prepared with the support of the Office of Ordnance Research, U. S. 
Army under Contract DA-04-200-ORD-171. 





500 LEO BREIMAN 


Conversely, let A be any infinite subject of C and take & to be the event that 
2» is never in A. Let (7) = P(&| xo = 7), then: 


o(t) = E(P(&| 2, %1)|% = t) = Dieta Pi) S Doser Pixh()). 


Since ¢(7) is zero on A, we have lim inf;.¢ ¢(i) = 0 whence lim sup;.c ¢(7) = 0. 
This implies that for almost every sample path zx , --- which is in C i.0., $(2,) 
converges to zero. But by the martingale convergence theorem, since 
P(& | an, °°-, Xo) S (xn), the indicator J,(w) of the set & is zero for almost every 
w € [z, € C i.o.] and therefore almost every such w is in A i.o. 

We note, for future use, that if C is denumerably atomic and if ¢(7) is a solution 
of (A) satisfying E | ¢(x,) | < K, the conclusion lim inf;.¢ ¢(7) = lim supiec $(7) 
remains unaltered. 

It is interesting, as well as necessary, to know that any denumerably atomic set 
C can be embedded in a set C which is a maximal denumerably atomic set. That 
this is so follows from Blackwell’s work [1] in the following sense: there is a set 
C > C such that C is denumerably atomic and z, ¢ C i.o. implies z, ¢ C for all 
sufficiently large n a.s. 


3. Convergence criterion. ‘Ihe above characterization leads to a convergence 
criterion reminiscent of the Three-Series theorem. 

THEOREM 2. Let C be denumerably atomic and f(7) a finite nonnegative function on 
I such that f is zero outside of C. Then the sum + S (an) converges a.s. if and only 
if >So Ef(an) converges and otherwise diverges with probability equal to 
P(a, eC ie.). 

Proor. Let S = > 2-0 f(2n) and ¢(7) = P(S < d| xo = 2). Then ¢(2) satisfies 
inequality (A). Suppose lim inf;-¢ ¢(i) = 0 for every value of d, then lim supiec 
¢(t) = 0, and ¢(z,) — 0 a.s. on the set [z, ¢ C i.o.}. Since 


P(S < dian, +-+ , 2) & (Zn) 
it follows that Its<a;(w) = 0 a.s. for w € [z, € Ci.o.] and hence that S diverges 
a.s. on this set. 
From the above it follows that if S converges on some subset of [z, ¢ C i.o.] of 


positive measure, there is a 6 > 0 and d; > O such that P(S < d;|a = 1%) 26 
fori e C. Let R» = ) Tom f(xn) and define S,, as the set [R,, < d,]. Writing 


/ Ry, — [ Rint - | [Rm := Ress! ae [ m+, 
Sm “Sm+1 “8m Ss 


8m 41—Sm 


using the definition of S,, 


| f (2m) s di P(Sm41 i Sm) + | Ry 
Sm 


8 


m 


and summing over all m results in 


x | 


m=) “8. 


f(am) < 2d). 





TRANSIENT MARKOV CHAINS 


But, 


[ fen = EIOP Rm < di| 20 = 1) Plem = i 


= 2 f@)P(S < di| 2 = )Plam = i) & BEF (en) 


which proves the theorem. 
Coro.tuary 1. Under the conditions of the above theorem, a necessary and suf- 
ficient condition for the a.s. convergence of >-%-0f(xn) is that the equation 


(B) a(t) = fli) + Lise Disa) 
have a bounded solution. 

Proor. Let > -%_0 f(zn) converges a.s. to S(w) < ©. Then a(i) = E(S | 2 = i) 
is a solution of (B) and —a(z) is a solution of (A) with Ha(z,) s ES. If a(¢) is 
unbounded, then lim sup;-c a(i) = lim infjc a(t) = «©, which implies that 
a(x,) — © on a set of positive measure and contradicts the boundedness of 
Ea(z,). Conversely, if (B) has a bounded solution a(z), then the iteration of 
(B) gives | a(t) — E(S-%-0f(xn) | zo = i) | S sup;er a(j) which implies the con- 
vergence of > *_o Ef(z,). 

We relate Theorem 2 to the uniqueness problem which involves global struc- 
ture, by confining ourselves to chains with a fairly simple decomposition. The 
following theorem is appropriate. Its proof follows immediately from the various 
definitions. 

THEOREM 3. Let the state space I of a Markov chain be completely decomposable 
into the set Cy of recurrent states, a set M of transient states such that 
P(x, € M i.o.) = 0, and a finite number of maximal denumerably atomic sets 
Ci, +++, Cw. If f is a function on I and if fi is that function which equals f on 
C; and is zero elsewhere, then > 20 f(x») diverges almost surely if and only if each 
> 2-0 fe(tn) diverges a.s. on the set [zn € C; i.o.]. 


4. The uniqueness problem. The synthetic uniqueness problem for continuous 
parameter Markov processes having states indexed by a subset J of the integers 
begins with a set of nonnegative constants q; , p:; , defined for 7, 7 ¢ J, and asks 
concerning the existence of a unique process X(t),0 S t < , having a given 
initial distribution and satisfying 


i. P(X(t) is constant in interval [s, s + 7] | X(s) = 7) = 1 — git + ofr) 
ii. P (First discontinuity of X(t), t 2 s, isa jump to7j | X(s) = 7) = p,;. 


Our remarks are restricted to the simple and common situation qg;, pij < ~, 
Leia Pig = 1. 

There is a general answer, [3], [6], and [8]: if no “explosions” are possible, if an 
infinite number of jumps cannot occur in any finite time interval, then there is a 
unique process satisfying (C) and having, in addition, all of the properties that 
could reasonably be desired. This is the “minimal” solution. In the contrary case, 





502 LEO BREIMAN 


there is in general no unique solution and the solutions that do exist are an- 
alytically or probabilistically pathological. 

To be more exact; the traversal time of each path of infinite length (a; , i2, ---) 
is a sum Q;, + Qi, +--+ of independent random variables with distributions 
P(Q;, > t) = exp (— qi,t). There is a Markov measure P on the space of all 
paths induced by the p,; and the given initial distribution. The minimal solution 
exists if and only if the transversal time is a.s. infinite for each path in a set of P 
measure one. Using the Three-Series criterion for the divergence of a sum of 
independent random variables and writing xo, x, , --+ for the chain associated 
with the measure P leads to an equivalent formulation. 

Uniqueness criterion. A minimal solution exists if and only if >0¢ 1/qz, di- 
verges a.s. 

The applicability of Theorem 2 is now apparent. For instance, in the birth and 
death process, the given constants are 


if i = 1, gi = Ai + iy Digs = A/c + wa), Dis = w/c + ws), Dig = 0 
otherwise; 


if i = 0, gi = 0, po = 1, pi; = O otherwise. 


If return to the origin is uncertain, the positive integers form a maximal de- 
numerably atomic set. The condition for the existence of a minimal solution, as 
given by Corollary 1, is that the equation 

(Ax + us)a(t) = 1+ A,a(i + 1) + wali se 1), +21 


have no bounded solution. A little formal computation yields the condition as 
stated in [4], [5], and [7]. 
Another interesting application is to the case 
Diy = Dei, O < Lise ips < @, 
where we restrict the state space J to those states with a positive probability of 
being entered. As pointed out to me by D. Blackwell, the basic theorem of re- 
newal theory, Chung and Wolfowitz [9], provides the simplest proof that the 


nonnegative integers J* in J form a maximal denumerably atomic set. By this 
theorem, the expression 


oO 
E (number of entrances into j| 2% = i) = ZA PS 
n=O 


approaches a positive limit as 7 — + © through J, which implies that for any 
infinite set A of positive integers in J, P(x, ¢ A i.o.) = 1. As the negative inte- 
gers I” in J have the property P(z, eI” i.o.) = 0, the necessary and sufficient 
condition for the existence of the minimal solution is 


l 
— pip 


= 0, 
n=O jert Qj 





TRANSIENT MARKOV CHAINS 503 


Interchanging the order of summation, and applying the renewal theorem once 
more gives the equivalent condition 


1 


— = ©, 
jel* Qj 


A slight alteration of this discussion is sufficient to establish the same condition 
when the negative integers are absorbing states, that is, if 


Pi = Dist Hf ¢ S O; a= 0 if i < 0. 


If the state space of the chain 2» , x; , --- cannot be decomposed as indicated 
in Theorem 3, complications set in and Feller’s criterion, which necessitates a 
close examination of every set of states A such that P(z, ¢ A alln|2¢A)>O 
must be referred to. Simple necessary and sufficient conditions for uniqueness 
are possible only when some uniformity, such as denumerable atomicity, is 
present. 


REFERENCES 


{1} D. Buackwe tt, “On transient Markov processes, with a countable number of states 
and stationary transition probabilities,’”’ Ann. Math. Stat., Vol. 26 (1955), pp. 
654-658. 

[2] J. L. Doos, Stochastic Processes, Chapter 7, John Wiley and Sons, New York, 1953. 

[3] W. Fevuzr, “On the integro-differential equations of purely discontinuous Markov 
processes,’’ Trans. Amer. Math. Soc., Vol. 48 (1940), pp. 488-515. 

[4] R. L. Dosrustn, “On the conditions of regularity of stationary Markov processes with 
a denumerable set of possible states,’’ Progr. Math. Sci. Moscow (N.8S.), Vol. 7 
(1952), pp. 185-191. 

(5) S. Karuin anp J. McGrecor, ‘‘Representation of a class of stochastic processes,” 
Proc. Nat. Acad. Sci., Vol. 41 (1955), pp. 387-391. 

[6] W. LeperMAN anv G. E. H. Reuter, “On the differential equations for the transition 
probabilities of Markov processes with enumerably many states,’”’ Proc. Cam- 
bridge Philos. Soc., Vol. 49 (1953), pp. 247-262. 

[7] M.S. Bartriert, An Introduction to Stochastic Processes, Cambridge, 1955. 

[8] J. L. Doon, ‘‘Markov chains—denumerable case,’’ Trans. Amer. Math. Soc., Vol. 63 
(1945), pp. 455-473. 

{9} K. L. Caune anv J. Wotrow1Tz, “On a limit theorem in renewal theory,’’ Ann. Mathe- 
matics, Vol. 55 (1952), pp. 1-6. 





JOHN WISHART 


AN APPROXIMATE FORMULA FOR THE CUMULATIVE 
z-DISTRIBUTION' 


By Joun WISHART? 


Statistical Laboratory, University of Cambridge and Princeton University 


1. Summary. A straightforward expansion and integration of the frequency 
function for Fisher’s z produces a formula for the probability that z is not ex- 
ceeded, of which the successive terms decrease rapidly when n; and nz are large. 
It is given in terms of incomplete normal moment functions (or x’ probabilities), 
and as a polynomial in zN””, where N is the harmonic mean of n; and n, . This 
last form is identical with the inverted Cornish-Fisher expansion, originally 
deduced by quite different methods. 


2. To obtain their well-known expansion for determining percentage points 
for the distribution of z (one-half of the natural logarithm of the ratio of two in- 
dependent variance estimates from normal data) in cases where the degrees of 
freedom n,; and nz are large, Cornish and Fisher (1937) used the method of the 
normalizing transformation. They developed a Gram-Charlier Type A series 
expansion which required knowledge of the cumulants of z. These they worked 
out in the approximate form for large n; and nz , to a point sufficient for the order 
of approximation worked to. The method is rather complicated, but a final 
formula is given which enables chosen percentage points to be determined. 
Although it is possible by substitution to deduce the corresponding formula for 
determining the probability associated with a chosen value of z, the author does 
not recall having seen such a formula explicitly stated.’ 


3. The frequency function of z may be manipulated directly so as to give 
on integration this inverted formula. The method is direct and simple, re- 
quires no Gram-Charlier Type A series, and no cumulants. 

Consider two independent variance estimates sj and s; from normal data, 
having degrees of freedom m and ng. z is then } In (sj / 82). For the time being 
write $n; as c; and 3m, as c,. Then the frequency function of z is 


2( /¢ ey i2c12 
m 7 2 ; ‘ 2z/_ \ey+e2’ 
B(c:, c2) (1 + ce"/e2)"” 





where the range of z is from — © to ~, and B(¢; , cz) is the Beta-Function, equal 


Received February 22, 1956. 

1 Research partially supported by the Office of Naval Research. 

2 This paper was recommended for publication after the death of the author. It is pub- 
lished, with minor emendations, after consulting with a colleague of the author. The orig 
inal title was ‘‘A new derivation of the inverted Cornish-Fisher expansion for the z-dis- 
tribution.” Ed. 

’ Reviewers note. Campbell [1] gave an expression for finding percentage points; he 
did not require a knowledge of cumulants or use the Gram-Charlier Type A series expan- 
sion. Student [4] used such an expansion for ¢ to compute his original table, and Fisher 
[3] developed the expansion by methods similar to those used here. 





Z-DISTRIBUTION 505 
to T'(e;)T(c2) / Tle ). We shall take n; as the smaller of the degrees of free- 
dom, so that c; 2 ; 


a 
‘¢: S 1. The frequency function may be writte 


) F 
2(¢;/e2)°! . 
Cr/Co) exp — (c; + ec) In (1 + qe”/e2)} 

B(c, , C2) 
‘ 2(c, /c2)** 


Bla) exp | 2a2 — (ce; + ec) In — + ¢ 
\%] » 


Coe 


(2) 


(c+) ind 1.4 8 =D) 

— (ce C2) In ¢ ———_—_—-} |. 

eS \ Ci + Ce | 
lhe first logarithm can be put into the outside term, and the second may be 

expanded, noting that c,(e* — 1) / (ce, + c) will lie between +1 and —1 ex- 

cept in the extreme tail of the distribution when mn, and n, are nearly equal and 

of the order of 30 or less. 


The frequency function then becomes 


2ci'c* 


C _ (2z)° 
a r 2 z=—- 7 — 
(ey + C2)*1* "2B (cy , C2) exp | Ci ¢,(2z) 


— Ce — e1 (22)’ 
2 2! 2aq+& 3! 
30) 0s _ _00_) as) 
C1 + Ce ! CG +) 5! 


5! 
hy 5C 30” 2z)° 
(: _ iC, 30 ar |. 
2 Cy + & (ce; + c2)?/ 6! 
where C is the harmonic mean of ¢; and cz = 2 c;€2 / (¢; + ¢2) 
Now put 2z = 2(2/C)'” 


, Whereupon the frequency function may be written 
1/2 ¢.—1/2 
V (2x)et te 


CS ens 
+eq—1/2 , 
(cy + c,)°"* B(c; » C2) 


—— 
Vv (21) 


{ 0.5 
- exp < -{(2) 229 
rt S 


’ ‘ 3C a a 
C G+ ce 3! . et \e + (4 
: Ce = Ci ba 6C ) y ( f. 15C 

a + (1 C) + z+(3 : Cy 


) 
CG + 


of this expression becomes approximately 


al 2 


12(c, + C2) + 


30C” x 
2 a) at +H. 
On expanding the I'-functions in B(c , c:) by Stirling’s formula, the first part 


1 fay 
288(c, + =) (1 + ioe, + oB8 ) 


12c, ' 288c? 
1 —) 
(14; + — ey aaa) =1-— 


N ) 
288c NU ntn 


a (? ny + vy -) 
in terms of n; and m2 and their harmonic mean N 





506 JOHN WISHART 


The second part of (4) is the normal frequency function, and the third part 
may be expanded into the following series in terms of m , nz and N: 


1 Nlea—- % 3 1 2N ) 6 ( 3N ) ‘| 
Soe eae + ca ag F = ae 
3N°5 1m + Ta” = 18N (2 ny + - “A m + . 
bi 1 ma—™% ae 2N ) a 3N ) ; 
SI0N™ my +m E (1 eta (1 sa 
j , 6N ; 1 . ON \ 3 
4 - ——s - 
+6 (1 m + | + 97902 E (1 m + =) , 


2N 3N 10 
oe ee 


3 94N 141N’ . ( 15N 30N* ) ‘| 
Z li — —_ 21 — es Dee ee 
+S ( . nN, + Ne + (my) + ) ' “7 Ny + Ne + (ny + 12)? . 


as far as terms in N~. 

We now have a frequency function for the variable x = zN“* (-x S 2 
co) in terms of the normal frequency function multiplied by a polynomial in z. 
For a chosen X, the probability P(x < X) is given by the integral of the fre- 
quency function from —« to X. Alternative forms can be found for the result 
of the integration. We may express it in terms of Pearson’s incomplete normal 
moment functions 


~X 


im wrohen! 
re V/ (2x) Jo 


m,(X) = ——— u(X) : 
(r — 1)(r — 3)---Lor2 


according as r is even or odd. Numerical values for my»(X) are given to seven 
decimals in Tables for Statisticians and Biometricians (Pearson, 1914, 1931), 
while yo(X) = P(X) — 0.5, where 


P(x) = — a [ e*? de 
, V/ (2n) — 00 


is given elsewhere in the same Tables (Table II), and also by Pearson and Hart- 
ley (1954), Table 1. 
In this form the probability P(O S x S X) is 


1 N l Py 
ie a BAAS) | SORES. ..1es 
E 6N (2 my + -) - 72N? (2 m+ =) | 


( 9 

: 2 ne — , 
. X) — sam * a(X 
yt 3N°> my + Ne ma(X) 


2N 5 ; 3N ‘ 
7 - m(X) — 3 € - = =) ma( X | 





Z-DISTRIBUTION 


Ne — % 2N J 

=. ae [40 (1 ae =) mo( X ) 

'3N ) : 6N -) . | 
ac. m(X) +9(1 - 5 m;(X) 


~ 2N iy , - 2N 
oad TON? | 385 € — i) mMy(X) _ 630 (1 — u + —) 


(1 oie ) mi(X) +21 (15 ae ea) ms(X) 


™m + Ne mM + Ne (my + 2)” 


15N 30N* | 
— 24i1— ——_—_; X i 
( m + Ne tr (m + =) ~ |} 


The probability P(— © S x S 0) is got from (7) as a special case by putting 
wo(— ©) = me,(—0) = 0.5, and ma4;(—«©) = (—2x)*. It then becomes’ 


; 1 N N y 
a See oo a ce eo ad 
E 6N (2 m + ~) 2N? (2 Mm + ne | 
N 1 N ) 
sa ee eae 
my + -) + 144N?2 (2 ny + Ne 
9 


2 RB=—% a wae ) | 
* 3-V/ (24N) my + Mm fd 45N “e ny + 


ae 2 Mm — ™ _  23N 
s+ soe ° i+ _ $+ = ; 
vie 3-V/ (2xN) mi + Ne 90N nm + Mm 
The sum or difference of (7) and (8), according as X is positive or negative, 
gives the probability P(—-» S 2s X). 
Alternatively we may write in (7) 


(29)*mor41(X) = P(X? | 2r + 2), 
2ma,(X) = P(X*|2r+ 1), 


where P(X’ | v) denotes the probability that x* does not exceed X’, for » degrees 
of freedom. These probabilities may be obtained to five decimals by subtracting 
from unity the x’ probabilities given in Pearson and Hartley (1954), Table 7. 

A series expansion for the probability P(—« Ss x Ss X) can be obtained in 
terms of P(X) and Z(X) = e*'* / y(2n), together with a polynomial in X, by 
associating (6) with Z(x) and integrating term by term by parts. This gives the 
required probability as 


, i r 2 
1 = a (2 - N ) + an 1 (2 - N y\ 
| 6N NM, + Ne 72N? m + N2/ } 


4 Reviewer’s note. The algebraic signs for m2,(—<) should be the opposites of those 
given here; when X is negative, (7) takes negative values. 





JOHN WISHART 


6 
(1 - ny ¥ —_3N-) x 


N 
2- r xX ) 
+( (x * 3x)| AX) + oN (2 - —.,) saa 


1 Ne — Ny ( 2N =) 78 11N ) 6 
ee én — el co ae 
r 810N'* ny + mite (1 m + ne “ , (1 m + Ns 
> N r4 72 \ > 
+ 6 - (X* + 4X° + 8)> Z(X) 
: Mm + ne ; 
Poh Cow) ( 2N a 
arson (5(1 a) pan oe 
32N 52N 103N? 1 
(7 — 32N_)\ yo a ( _ _52N_ aes) 7 
(7 my + =) \ my + Ne * (nm, + ne)? - 


+9(: tidal ) x + 5X? + 15x)} 20%) 
Mm + Ne 


1 ae N ) . 
+ aay (2 my + Ne de x). 


On multiplying in by the outside factor this becomes 
—m X*+2 . J 2N A - 
P(X) + Z(X) > +n. 3N5- 18N (2 Mm + NM 4 
N ) ss aunt 1 m—m™ 5( 2N )x 
+(2 — ——— xX" + 3X ——_— ~ , 
( my + Ne ; ) ¥ SLON'5 nm + m 1 nr +N - 
‘ 11N ) oe ( N ) 4 
f a gencegheer. Wo ( a 
. (: m+ % aes . % mM + M% - 
23N \/ ws ) 1 2N \ 
3{2 r+ 2) — = 45 — ——— 2 
s( v4 m+ =e: a ) 9720N? ” € m + = J x 
2 29NT : 5S 
5 (1 - - 1G a )x’ +9 (4 — —— 
m + 1 Mm + % m + nm 


103N° ) . N \ 19N .) 6 
tion Pe —~ O9lg— 8 — xX 
(m + M2)? ( Mm + Ne m + Mm 


“ N ail .\ 
5({2 — X + 3X)? i}. 
45 ( _w) (X x} | 


This is the expression which is the “direct” form of the Cornish-Fisher expan- 
sion, yielding, to terms in N~’, the probability that z shall not exceed zN~*”. 
Additional terms could be worked out by noting that the terms of the exponen- 
tial in (3) are equivalent to the binomial cumulants, but the terms in (9) should 


=~ X°+32 s l 
7 CMF iat 
[Poo + BS. ~3NeS- Z(X) 18N 





Z-DISTRIBUTION 509 


suffice for W of the order of 50 or above, and fewer terms will do if the probability 
is not required to a large number of significant figures. 

For the benefit of those accustomed to the notation of Cornish and Fisher, 
(9) may be put into the form 


. 2 
u+e | 2 (x? 42) = de daca" + 3X) +5 (ax + X°+ 3x)} 


(40) 6 | ~ yé yd 2 se 
+ 5 {9X + 6X‘ + 9X* + 18) +5 
o 


+ 620 
(10) - (10X* — 55X° — 6X* — 69X’ — 138)} — (40)" 
' 38880 
( 2 
-427(5X’ + 3X° — 15X° — 45X) + 18 y 
3 


\ 


-(10X* — 51X’ — 27X° — 15X* — 45X) 
4 

a = (20X" — 320X* + 927X’ — 171X° — 45xX* — 1356x)\, 
o 


but note that z here is what we have hitherto written as Z(X). In using (10) we 
take X as the chosen value of Fisher’s z divided by ~/ (3), i.e. by its approxi- 
mate standard derivation, [}(1/m + 1/n,)]'”; 6 is, of course, 1/n; — 1/ne. 

The order of the terms in (9) or (10) may be seen if we choose as an example 
nm, = 60, n. = 120,Z = (+/5) / 20 = 0.1118034. Then N = 80, while +/($c) = 
(1/5) / 20 and 5/e = 3, and X = 1. Using Pearson and Hartley’s Table 1 we 
find for the probability that this chosen value of Z is not exceeded 


0.8413 44 
90 
10 
21 
16 


0.8493 216 

so that we are here close to the 15 per cent point of the z distribution. 

When n = m = n we have N = n, N/(m + nm) = 3, also 6 = 0, 4e = 
n*. Then (7) and (8) give 

, 4 m(X) 32m(X) — 35ms(X) 
} 1—— ox) ( xX wath tw iid emia I 

ye +t ( an * 32n?/ tual ae 4n 96n? 
while (9) or (10) gives 


if se - 1440n? 


When nm = ”, % = ©, we have an expansion from which can be calculated 
oge 2 ° 
the probability that a chosen value of x’, for n degrees of freedom, is not exceeded. 


(12) P(X) — Z(X) 4 


\ 


‘(X(X* +3) , 5X’ + 3X* — 15X(X* + 3)\ 
} 





510 D. TEICHROEW 


If this value be denoted x3, then we take X = V/ (}n) -In(x?/n), so that we are 
effectively transforming x’ by first forming the ratio of x’ to its mean, raised to 
the power of its standard deviation, and then taking one-half the natural loga- 
rithm of this quantity. The expansion for the probability may be obtained from 
(7) and (8), or from (9), by putting N = 2n, (m. — m)/(m + m) = 1 and 
N / (m + m) = 0, or from (10) with o = 6 = n™. It has been developed from 
first principles by the author in [7]. 


REFERENCES 


{1] G. A. Camppe.i, “Probability curves showing Poisson’s exponential summation,”’ 
Bell System Technical Journal, Vol. 2 (1923), pp. 95-113; and Collected Papers, 
N. Y. (1937), pp. 224-242. 

{2] E. A. CornisH aNnp R. A. Fisuer, Revue de l'Institut International de Statistique, Vol. 
4 (1937), p. 307. 

(3] R. A. Frsuer, ‘“The asymptotic approach to Behrens’ integral with further tables for 
the d-test of significance,’’ Ann. Eng. Lond., Vol. 11 (1941), p. 151. 

[4] M. G. Kenpa.u, ‘‘Advanced Theory of Statistics,’’ Vol. II, Section 21.10, p. 101. 

[5] K. Pearson, Tables for Statisticians and Biometricians, Part I (Table IX); Part II 
(Table XIII), Biometric Laboratory, London, 1914, 1931. 

{6] E. S. Pearson anp H. O. Hartiey, Biometrika Tables for Statisticians, Vol. I, Cam- 
bridge University Press, 1954. 

{[7] J. Wisuart, ‘‘x? probabilities for large numbers of degrees of freedom’’ Biometrika, 
Vol. 43 (1956), pp. 92-95. 


ee a ee 


THE MIXTURE OF NORMAL DISTRIBUTIONS WITH 
DIFFERENT VARIANCES! 


By D. Trtcurorew’ 
University of California, Los Angeles 


1. Introduction. In some practical problems, the observed variable may have 
«1 normal distribution whose variance varies from one observation to the next. 
The purpose of this note is to give the formula for the marginal distribution when 
the variances are assumed to be distributed according to the Gamma distribu- 
tion. 


2. The distribution in the general case. We assume that the conditional den- 
sity of X, given o’, is 


a(2n)* 


Received May 25, 1956; revised July 18, 1956. 

1 The preparation of this paper was sponsored (in part) by the Office of Naval Research, 
USN. 

? Present address: National Cash Register Company, Hawthorne, California. 


oe 
f(z/o’) = gor —-x<z<w, o>, 





NORMAL DISTRIBUTIONS 


and that the density function of the variance is 
a 
ew 1 
g(o*) = ray ° Ca) a>0, A>0. 


Multiplying these two densities together and integrating immediately yields the 
marginal density function of X in the form 


f(z) = TO Gay | exp {—[ao® + (2’/20’)]} (07)? do’, 


which, using a formula for the modified Hankel function [3], p. 39, gives 


a" (e/a) ky_yalav/ 2a) 
10 = vO 


The distribution function of X could be obtained by integrating the density 
function or by evaluating two hypergeometric functions, for, by the Paul Lévy 
inversion formula ([4], p. 93, Eq. (10.3.1)) the well-known relation between sin z 
and J12(z), and Formula 1 of [2] (p. 434), we have 


F(2) =4 + 2 r@)ra — 4) F (; 3 3 =) 


Vane Lax) PRPrAjyrdg ~ *\2°2 — o> 


ry — A) of 
+ gant Pa (,a + 1A+ =) ; 


where ,/’, denotes a generalized hypergeometric function defined as 
iF, (81,71, 7232) = » 6). 2” 
<> Cri)alrade 
where (8), = B(8 + 1) --- (8 +n — 1); (B)o = 1. 


The density and distribution function can also be obtained from the char- 
acteristic function which is 


1 
(1 + #/2a)** 


3. The distribution when } is an integer. For \ = n, an integer, from [1], p. 
40 and [1], p. 128, No. 67b, we get 


a ae eV? = (2n — v 2)1(2-axy/D)” 
f(x) (n — 1)! "9in—1 —< ~ vin—-z—-)! 


The distribution function can also be expressed in closed form if \ = n an integer 
by the following formula ({1], p. 127, No. 66c) 


¢(t) = 


[ sinzt dt 7 


ge 
h j@+" + e) 7 = 2a2" E - -1(n — 1)! = yi Fea) |, 





512 A. J. THOMASIAN 


where F,(z) = 1, Fi(z) = z + 2, and F,(z) = (2 + 2n) Fy+(z) — 2Fa-x(z), for 
a>0;zx20;n=1,2,3.--- . These recurrence relations could be used to com- 
pute a table of the distribution function. 


4. Moments. The moments are obtainable directly from the expansion of the 
characteristic function 


1 +1) #& —rAA+DA +2) 


irr a 214 af 
\ 2a, 





f° 
3!8° 


8 


We have 


2 
a 
+1) 


a’ 


‘ Ma ‘ 1 
= 0,4 == =3(1+-). 
bi = 0,8: = ( ‘) 


As one would expect, the variance of X increases as \ increases. It is interesting 
to note that 8. is always greater than 3. 
REFERENCES 

1. W. GROBNER AND N. Horrertsr, [ntegraltafel, Zweiter Teil, Bestimmte Integrale, Spring- 
er-Verlag 1950. 

2. G. N. Watson, A Treatise on the Theory of Bessel Functions, MacMillan, New York, 2nd 
edition, 1948. 

3. W. MaGnus anv F. OBERHETTINGER, Formeln and Sdtze fiir die Speziellen Funktionen 


der Mathematischen Physik, Springer Verlag, Berlin, 1948. 
4. H. Cramtétr, Mathematical Methods of Statistics, Princeton University Press, 1946. 


rr ——S™S 


METRICS AND NORMS ON SPACES OF RANDOM VARIABLES 


By A. J. THomastan! 
University of California, Berkeley 


1. Introduction and summary. Let ¥ be the space of random variables defined 
on an abstract probability space (Q, @, P) where we consider any two elements of 
¥ which are equal a.s. (almost surely) as the same. Fréchet [2] exhibited a metric 
on ¥ (for example, E[| X — Y |/(1 + | X — Y']|)]) with the property that con- 

Received May 23, 1956; revised October 8, 1956. 


1 This paper was prepared while the author held a National Science Foundation Fellow 





METRICS AND NORMS 513 


vergence in the metric is equivalent to convergence in probability, and he showed 
that for some probability spaces the same cannot be done for convergence a.s. 
Dugué [1] showed that it is not in general possible to define a norm on ¥ such 
that convergence in the norm is equivalent to convergence in probability. These 
results are contained in and completed by the following fact which was stated 
without proof by the author in [5] and which follows easily from the two theorems 
stated and proved in this note. There exists a metric (norm) on X with con- 
vergence in the metric (norm) equivalent to convergence a.s. (in probability) if, 
and only if, 2 is the union of countable (finite) number of disjoint atoms. After 
these results were obtained it was found that the equivalence of parts (ii) and 
(iii) of Theorem 1 had been proved by Marczewski [4], p. 121. 

An atom of a probability space is a measurable set A with P(A) > 0, such that 
any measurable subset has probability 0 or P(A). It is easy to show that a 
random variable is a.s. constant on an atom. f will always designate a real- 
valued function defined on ¥. Convergence in f is said to be equivalent to con- 
vergence a.s. (in probability) if, for every sequence {X,} of elements from %, 
f(X,) — 0 if, and only if, X,, — 0 a.s. (in probability). 

THEOREM 1. The following conditions on a probability space are equivalent. 

(i) There exists a function f, such that convergence in f is equivalent to con- 
vergence a.8. 

(ii) For any sequence {X,,} from X, if X, — 0 in probability, then X,, — 0 a.s. 

(iii) Q is a countable union of disjoint atoms. 

THEoreM 2. The following conditions on a probability space are equivalent. 


(a) There exists a function f, such that convergence in f is equivalent to convergence 
in probability and f satisfies | f(aX)| = | a!-|f(X)| for any X e& ¥ and 
any real number a. 

(b) & is a finite union of disjoint atoms. 


2. Proof of Theorem 1. The following well-known result (see Loéve [3], p. 
100, Example 7) will be used in the proof. 

TueroreM A. For any probability space,Q = A + y By A; where all of the sets in 
the decomposition are disjoint and each A; is the empty set or an atom, and for every 
measurable subset B of A, P takes every value between 0 and P(B) for measurable 
subsets of B. 

(ii) implies (i) by the result of Fréchet. 

To show that (i) implies (ii) assume (i) and take any sequence X, — 0 in 
probability. If f(X,)-> 0 then there exists a subsequence X,,-, and an e > 0 
such that | f(X,,-) | > «. But X,, —> 0 in probability so that it has a subsequence 
Xn. — Oa.s. Thus f(X,--) — 0 contradicting | f(X,,) | > «. Therefore, f(X,) 
must converge to 0, hence, X, — 0 a.s. 

(ii) follows easily from (iii) since a random variable is a.s. constant on an 
atom. 

To prove that (ii) implies (iii), assume that (iii) is false. Thus in the decompo- 
sition of Theorem A, P(A) > 0 and for each n, A = 5°7.; A,,; where P(A,,;) = 
(1/n)P(A) for i = 1, 2,--- , mn, and the sets An: , Ane, --- , Ann are disjoint. 





514 PETER W. M. JOHN 


Let X,,; be the characteristic function of the set A,, . The sequence of random 
variables 


Xn, 4.21, Xe, Xa ; 


converges to 0 in probability but not a.s. so that (ii) implies (iii), completing 
the proof. 


3. Proof of Theorem 2. To prove that (a) implies (b), assume that (a) is 
true and (b) is false. From Theorem A there exists a sequence A,, of events with 
0 < P(A,) — 0. Let X, be the characteristic function of the set A, . For all n, 
f(X,) =~ 0 because if f(X,,) = 0, then by (a) the sequence of random variables, 
each of which is X,,, , must converge to 0 in probability, contradicting P(A,,) > 
0. By (a), (f(X,/f(X,))] = 1 for all n, so that the sequence of random variables 
X,./f(X,) cannot converge to 0 in probability. However, it must, because P(A,) 
— 0. A contradiction has been reached, hence (a) implies (b). 

Assuming (b) it is easy to show that f(X) = # | X | isa norm on % such that 
convergence in f is equivalent to convergence in probability. Theorem 2 is 
proved. 


4. Acknowledgment. The author wishes to thank Professor M. Loéve for 
suggesting this problem. 


REFERENCES 

{1] D. Duevus, ‘‘L’existence d’une norme est incompatible avec la convergence en probabi- 
lité,”? C. R. Acad. Sci., Paris, Vol. 240 (1955), p. 1307. 

{2] M. Fricuer, ‘“‘Généralites sur les Probabilités. Elements Aléatoires, Gauthier-Villars, 
1935. 

(3] M. Lo&tvn, Probability Theory, D. Van Nostrand, New York, 1955. 

[4] E. Marczewsx1, “Remarks on the convergence of measurable sets and measurable 
functions,”’ Colloquium Math., Vol. 3 (1955), pp. 118-124. 

(5) A. J. THomastan, ‘Distances et normes sur les espaces de variables aléatoires,”’ C. R. 
Acad. Sci., Paris, Vol. 242 (1956), p. 447. 


(Re 
DIVERGENT TIME HOMOGENEOUS BIRTH AND DEATH PROCESSES' 
By Perer W. M. Joun 


University of New Mexico 


1. Introduction. In a time-homogeneous birth and death process a population 
is considered, the size of which is given by the random variable n(t) defined on 
the non-negative integers. If at time ¢ the population size is n, the probability 
that a birth occurs in the time interval (¢, t + At) is A,t + o( At); the probability 
of a death is u»t + o(At), and the probability of the occurrence of more than one 


Received January 17, 1956; revised September 24, 1956. 
' These results were included in a dissertation submitted to the University of Oklahoma 


in partial fulfillment of the requirements for the Ph.D. degree in mathematics, August, 
1955. 





BIRTH AND DEATH PROCESSES 515 


event is o(At). The parameters \,, and yw, are non-negative and are independent 
of t. The probabilities p,(t) that the population size is n at time ¢ then satisfy 
the inequality, Feller [4], >> p(t) S 1. We shall impose the initial condition 
n(0O) = 1. 

It is well known that under certain conditions the inequality }-, pa(i) < 1 
holds. The physical interpretation of this inequality is that there is a positive 
probability that an infinite number of events occur in finite time ¢. 

We consider here the case where \y = 0; if uw; > O the state n = 0 is an at- 
tainable absorbing barrier. A necessary and sufficient condition for the occur- 
rence of the phenomenon in this case is that the series 


—— 1 im Mm *** be 
(1.1) o(L+ a + “1 eet . ) 
shall converge. 

This result has been obtained in various equivalent forms by D. G. Kendall 
(unpublished, quoted by Bartlett [1]), Dobrusin [3], Karlin and McGregor [5], 
and Reuter and Ledermann [6]. 

This paper will present a simpler derivation of the result, which will at the 
same time emphasize the physical significance of the terms of the series. 


2. Passage Times. We shall denote by 7,, the time taken for n to increase 
from m to m + 1, and consider the expected time 7,, of such a change. If u; > 0 
it is necessary to interpret the 7,, as conditional expected times, conditional] 
upon non-absorption. 

THEOREM 1. 7,, is given by the recursion formula 
(2.1) im = < + o Tm-1- 

Proor. The probability density function for the time ¢ elapsing until the oc- 
currence of the first event after the population size has reached m is 


S() = Am + um) exp [—An + ume]. 


The expected value of ¢ is thus 1/(A,, + uw»). Such an event has probability 
Am/(Am + pm) Of being a birth, in which case the population has passed from m 
to m + 1 as required, and probability u»./(Am + um) of being a death, when the 
desired increase requires further passage from m — 1 to m and then from m 
tom + 1. 

We thus have 


(2.3) Tm Am : + soendletieme ( P + Tm-1 T *m) , 


” he + Bele the ln + he Ve + ie 


whence 


(2.4) 





516 PETER W. M. JOHN 
It follows that 


(2.5) 


a2 > 1 Um Mm*** M2 
(2.6 jw 2 Daeg, Gee se 
2.0) T a ees yu 


If ¢,, denotes the time of passage to infinity, its expected value is given by 
(2.7) MR eliasieMes s 

3. Divergence of the Process. We proceed to obtain the main results. 

TuroreM 2. If i, is finite, there are values of t for which >>» pa(t) < 1. 


Proor. >>, pa(é) = 1 implies that the probability that t, < ¢ is zero, which 
in turn implies that 


Using Cramér’s generalization of the Tchebycheff inequality [1], we have for 
all ¢, 


(3.2) Plt. = t) < Ete) - 


so that for t > it, 
(3.3) Dd prlt) = Plte = 
n=0 


and indeed, by taking ¢ large enough, >>%_» p,(t) may be made as small as we 
wish. Thus, if 7, is finite, then for all ¢ > 7, >0%0 p,(t) < 1. 
THEoreM 3. If there is a finite time r such that Dn a(t) < 1, then f,, is finite. 
Proor. Suppose that 


(3.4) Pio(t) = 1 — ; P(r) = a> 0; 

then 

(3.5) P{n(r) < ~] =l—a and Pin(T) 2 a, 
(3.6) P{n(mr) < «] Ss (1 — a)”, 

so that 

(3.7) P{n(mr) < ©, n((m + 1)r) = ©] S (1 — a)”; 


thus 


we 


< 2» (m + 1)rP{n(mr) < ©, n((m + 1)r) = &] 


(3.8) 


< (m+ rll — a)" = 7 (m+ 11 — a)”. 
m=0 m=0 





REGRESSION ANALYSIS 


But the series >> (m + 1)x” converges for |a| < 1, therefore 7, is finite. 

Coro.Lary 3.1. A necessary and sufficient condition for the process to be di- 
vergent is that t,, shall be finite. 

The result of (1.1) follows immediately. 

Coro.uary 3.2. For a birth and death process with no lower absorbing barrier 
P(t, < ©) is either zero or 1. 

Proor. If 7, is finite then, from Theorem 2, we have for all ¢ > #,, 


P(t. >t) Ss 7 


But (/,,/t) ~ 0 ast—> © so that 


3.9) lim P(t, < t) 1, or equivalently lim >>, p,(t) = 0. 

{-»00 t+o 
It follows immediately from Theorem 3 that, if P(t, < ©) is not zero, then i, 
is finite, so that the probability must be 1. 


4. Acknowledgements. I wish to thank Professor Casper Goffman for his as- 
sistance and advice during the direction of this work; I wish also to thank the 
referees for their helpful suggestions and for drawing my attention to references 
[1], [3] and [5). 


REFERENCES 


[1] M.S. Bartiert, Stochastic Processes, Cambridge University Press, 1955, p. 88. 

(2) H. Cramétr, Mathematical Methods of Statistics, Princeton University Press, 1946, p. 182. 

3] R. L. Dosrusin, Uspehi Matem. Nauk (N.S.) 7 (1952) No. 6 (52) (185-191). Abstract in 
Math. Rev., Vol. 14, p. 567. 

[4] W. Fever, Probability Theory and Its Applications, John Wiley & Sons, New York, 
1950, pp. 371-373. 

(5] S. Karurn anp J. McGrecor, “Representation of a class of stochastic processes,’’ 
Proc. Nat. Acad. Sci., Vol. 41, pp. 387-391. 

[6] G. E. H. Reuter anp W. Lepermann, “On the differential equations for the transition 
probabilities of Markov processes with enumerably many states,’’ Proc. Cam- 
bridge Philos. Soc., Vol. 49 (1953), pp. 247-262. 


(8 


A REGRESSION ANALYSIS USING THE INVARIANCE METHOD 
By D. A. 8S. Fraser 


University of Toronto 


1. Summary. The invariance method is applied to a regression problem for 
which the “errors” have a rectangular distribution. The invariance method can 
also be applied to produce good estimates for the regression problem when the 
“errors” form a sample from any fixed distribution. 


Received November 4, 1955; revised November 26, 1956. 





518 D. A. S. FRASER 


2. Introduction. The invariance method is discussed in, for example, Black- 
well and Girshick [1]. We summarize briefly its form for estimation. Let 6 be a 
parameter that indexes the probability distributions and let there be a group of 
transformations s on the sample space that leaves the class of probability dis- 
tributions unchanged. Suppose that the group of transformations is such that 
any of the probability distributions can be transformed into any other. This 
implies that the risk function for any invariant procedure is constant valued. 
Let m(x) label the invariant subsets on the sample space for z. If s* is the trans- 
formation on the parameter space corresponding to the transformation on the 
sample space, then it is easily seen that minimum risk estimator f(z) may be 
found from any invariant estimator fo(z) by finding a transformation s% for 
each m such that 


Eo,{W(smfo(x), 8) | m(x) = m} 


is minimized for any fixed 4 where W(f, #) is an invariant loss function and 


f(x) = sxafo(z). 


3. A regression problem. This problem was suggested by Prof. E. G. Olds. 
Let Y1,---, Yn be real valued random variables with the following structure: 


(1) Y; = Xe Btis + U;, 


where U,,---, U,» are independent random variables and each is uniformly 
distributed with mean 0 and known range 6. In vector notation we have Y = 
> 8; x; + U. The z;; are given numbers and the 8; are known regression coeffi- 
cients. The problem is to obtain good estimates of the regression coefficients. 
In this section we find invariant estimators with minimum variances. 

To simplify the notation we consider the case having r = 2 and 


(2) Y,;=a+ 6r,+ U;, 

where the x, have been adjusted so that >> z; = 0. Let the loss function for an 
estimate (f, g) of (a, 8) be a weighted sum of the squared errors: 

(3) Wf, 9; a, 8) = p(f — a)’ + gg — 8) 0S 2,4. 
As a class of transformations consider 

(4) {yi = ye tat bei = 1,---,n)| (a,b) eR}. 

It is straightforward to see that this class of transformations is an invariant 
class. Also the induced group of transformations on the parameter space is easily 
seen to be 

(5) fe’ =at+a,p’ = 6+ )| (a,b) eR}. 


Any (a, 8) ¢ R’ can be transformed into any other point (a’, 8’) ¢ R’; hence the 
group leaves no set invariant. Now, restricting ourselves to invariant estimators, 
we find that an estimator (f(y), g(y)) for (a, 8) has the value 


(6) (f(y’) + a, g(yo) + B] 





REGRESSION ANALYSIS 
at the point 


(yt, °°, yn) + a(l,--, 1) + O(a, --+ , ta). 


For convenience in describing (6) we introduce new coordinates in R”, say 
W,,°** , Wa, using (1, --- , 1) and (x, --- , 2») as the unit points for the first 
two coordinates w; , w, and any (n — 2) orthogonal vectors for the remaining 
coordinates. Then letting f*, g* be the functions f, g expressed in terms of the 
new coordinates, we obtain from (6): 


f*(a, 6, ws, +** 5 Wa) 7” a + f*(0, 0, ws, --+ , Wn), 
g*(a, b, Ws, --*, Wn) = b+ g*(0, 0, wy, +--+ , wy). 


We need minimize the risk only for a single parameter value, say a = 0, 8 = 0, 
and it will be uniformly minimized. The risk when a = 0, 6 = 0 is 


k | [p(a + f*(0,0, ws , ---, wa))” 


+ q(b + g*(0, 0, W3,°*", Wn))’| da db dw; , tg dw, 


= kp / la + f*(0, 0, ws, ---, Wad] da db dw; , ---, dw, 


+ kp | [b + g*(0, 0, Ws, +++, Wa) da db dw: , --+,dw,, 
c 


where k is the constant value of the Jacobian from y%,---, yn to W,-+-,w 
and C is the set of values of (a, b, ws, --+ , W,) corresponding to the “‘cube”’ 
[—8/2, 5/2)” in the coordinates y; , --- , yn . It is easily seen that the values for 
f*(0, 0, ws, --- , Wa)g*(O, 0, ws, --- , wa) which minimize the risk are such that 
[—f*(0, 0, ws, --- , Wa), —g*(O, 0, ws, --- , w,)] is the center of gravity of the 
set 


C(ws, +--+ , Wa) a {(a, b) | (a, b, ws, --- , Wa) € CH, 


the ws, --- , W, section of C. This choice of f*, g* produces the minimum risk 
invariant estimate for a, 8. 

The determination of the values of the functions f*(0, 0, ws, ---, Wn), 
g*(0, 0, ws, +++ , Wa) can, however, be simplified. If we change the sign of the 
coordinates of all points in C(w;,---, w,), then the center of gravity of the 
new set will have as coordinates the minimizing values f*(0, 0, ws, --- , Wa), 
g*(0, 0, ws, --- , Wa), determined in the paragraph above. This altered set has, 
however, a simple interpretation. It is the set of points (a, b) such that the cube 
C, shifted to have center at (a, b,0, --- ,0), contains the point (0,0, w;, --- , w,). 
Similarly, the value of the estimator, 


[a’ + f*(0, 0, ws, --- , wa), b’ + g*(0,0,w, --- , w)], 





520 DAVID BLACKWELL 


or |f*(a’, b’, ws, --: , Wa), g*(a’, b’, ws, ---, wa)], is the center of gravity of 
thé points (a, b) for which the cube C, shifted to have center at (a, b, 0, --- , 0), 
contains the point (a’, b’, w;, --- , w,). However, it is equivalent to state that 
the estimate of (a, 8) is the center of gravity of the points (a, b) for which the 
line y = a + be isa possible regression line for the observed points (a, 1), «°° , 
(Xn, Yn), i.e., for which y = a + be is within 6/2 vertically of each point (x , y), 
op. (e.g, ), 

It is of interest to note that the estimates of a and 8 are g and >. ya,, > 2% 
plus corrections which depend only on the deviations from the usual regression 
line. This is essentially the invariance requirement. 

The methods of Section 3 up to formulas (7) and (8) may be applied in much 
the same manner to any regression problem for which the errors are a sample 
from some given fixed distribution. 

REFERENCE 
{1} Davip BLuacKwWELL AND M. A. Grrsuick, Theory of Games and Statistical Decisions, 
John Wiley & Sons, New York, 1954. 


(a 


ON DISCRETE VARIABLES WHOSE SUM IS 
ABSOLUTELY CONTINUOUS' 


By Davin BLACKWELL 
University of California, Berkeley 


1. Summary. If |Z,}, n = 1,2, ... is a stationary stochastic process with D 
states 0, 1,---, D—1, and X = ze Z,/D", Harris [1] has shown that the 
distribution of X is absolutely continuous if and only if the Z, are independent 
and uniformly distributed over 0, 1, --- , D — 1, i.e., if and only if the distribu- 
tion of X is uniform on the unit interval. In this note we show that if {Z,}, 
n = 1,2, --- is any stochastic process with D states 0,1, --- , D — 1 such that 
x= > Z,/D" has an absolutely continuous distribution, then the conditional 
distribution of R, = n=t Zkr+n/D” given Z;,--- , Z, converges to the uniform 
distribution on the unit interval with probability 1 as k — «. It follows that 
the unconditional distribution of R, converges to the uniform distribution as 
k — o. Since if {Z,} is stationary the distribution of R, is independent of k, 
the result of Harris follows. 


2. Proof of the theorem. 
TueoreM. Jf {Z,},n = 1, 2, --- is a sequence of random variables, each assum- 


ing only values 0, 1,--- , D — 1 such that X = >r Z,/D”" has an absolutely 
continuous distribution, and 


0 <A S81, then U,(A) P(>-Y Lran/D" <rX|21, °°: , Ze) A 
with probability 1 ask — =~. 


Received July 18, 1956. 
‘This paper was prepared with the partial support of the Air Research and Develop 
ment Command under contract AF 41(657)-29), with the USAF School of Aviation Medicine. 





S.P.R.T. 
Proor. Say X has density p with respect to Lebesgue measure on the 
interval. Then 
Ux(d) = Ad(yx(X), AD™)/d(y(X), D™), 


where y,(s) = mD~* for mD™ < s < (m+ 1)D™*, m = 0,1, --- , D‘ —1, 
d(a, hh) = h™ f2** p(s) ds. 
We must show that 


dd (yx(s), AD™)/d(ya(s), D*) > d 
for almost all s (Lebesgue measure) for which p(s) > 0, and this will follow from 
(1) d(yx(s), AD) — p(s) a.e. 
Now a basic theorem of real variable theory asserts that 
(2) d(s, h) — p(s) 
ash — 0. Let a:(s) = (s — yx(s))/AD™. 
Then 
d(yx(s), AD™*) = a,(s) d(s, yx(s) — 8) + (1 — ax(s)] d(s, yx(s) + XD~* — 8) 
(3) ax(s)[d(s, yx(s) — 8) — d(s, yx(s) + AD™* — 8)] 
+ d(s, yx(s) + AD™ — s). 
Since a,(s) is bounded, letting k — © in (3) and using (2) yields (1), and the 
proof is complete. 
REFERENCE 
(1) T. E. Harris, “On chains of infinite order,” Pacific J. Math., Vol. 5 (1955), pp. 707-724. 


ree 


A PROOF THAT THE SEQUENTIAL PROBABILITY RATIO TEST 
(S.P.R.T.) OF THE GENERAL LINEAR HYPOTHESIS 
TERMINATES WITH PROBABILITY UNITY 


By W. D. Ray 
British Coal Utilisation Research Association 


1. Introduction. It can be shown [1] [2] that the 8.P.R.T. of the general linear 
hypothesis resolves itself into the following form of procedure: Continue sam- 
pling at stage (n) if 


B hm)y2 ( , eae) but Badia 
(1) — < ¢ M a(n), 7; 1+ G™ < a 


otherwise accept or reject the null hypothesis depending upon whether the left- 
hand or right-hand inequality is violated. 


Received June 14, 1956; revised September 4, 1956. 





522 W. D. RAY 
\’” characterizes the alternative hypothesis, a(n) is half the sum ef the degrees 
of res of the numerator and denominator of the test criterion G” = Ss/S,, 
and y is half the degrees of freedom of S, . 
a, 8 are the probabilities of error of the first and second kind respectively. 
\” a(n) are each linear functions of n, the number of observations taken, y is 
a fixed positive constant (where a(n) > y > 0), and 


ste T(y) Ma + 7) ww 
Mian) = DRGs mt Pa) Py Fr) rh 


. : : n ) ) A(n) 
Sampling is terminated whenever G” < G™ or = G, where G”, G” 
are solutions of the equations 


(2) (G@™ ~~ M2 ( ‘ ; Gg” ap RB . 
* ) = € a(n), 73 1+ G® ™ , respective ee ae 
where A = 6/(1 — a), B = (1 — 8)/a. 


2. Proof that the test terminates with probability unity. It will be sufficient to 
prove that asn — 0, G” — G” — G, say. Now 


"(Q) — oo ri2e@ AG te: . 
f.(G) = <u(ativ+1; 2%), G) 


1+G 


From a recurrence relation of the Confluent Hypergeometric Function 
M (a, y; u) it can be shown that ; 


M(a+1,y¥+1; u) 
M(a, y; u) 


1 Y 
(3) f.(G) = €*"*M (a, Y; Bo) vee, 


Qa 


(for u > 0), 


from which it follows that 


$dfn(G) (3ra/y)fn(G) 
a+ GSO <a GF 


Let g.(G@) = log. f,.(@). Then from (4) it follows that forG > 0, 


(4) forallG>0O--- 


3X 407 3d a 

T+ G < <aE ay 
Since \ > © as n— ~, this inequality shows that g, (@) — » asn > ~, 
Further, since g,(G@) is a positive strictly increasing continuous function of G, 
it follows that there can exist at most one value of G, say Gp , where g,(G) does 
not become infinite as n — «. Consequently g,(G@) — — « forG < G and 
gn(G) — + « for G > G. In terms of f,(@), this implies that f,(@) — 0 for 
G < G, and f,(G) — ~ for G > G. This in turn implies that G — Gg” — 

Go , and sampling must therefore terminate. 





ABSTRACTS 523 


If there does not exist a finite Gp for which g,(G@) does not become infinite, 
then g,(G@) becomes infinite for all G > 0. Thus g,(@) either becomes infinite for 
all G > 0 or approaches zero for all G > 0. In the first case, sampling will term- 
inate because f,(G) > B for sufficiently large n for all G > 0; and in the second 
case too, since f,(G) < A for sufficiently large n for all G > 0. 


3. Comments. It has been possible to obtain an upper bound for the limiting 
value G, but not to obtain its value uniquely. David and Kruskal [3] have pro- 
vided a solution to the same problem for the sequential t-test. 


4. Acknowledgement. I am most grateful to Dr. N. L. Johnson for his guid- 
ance during research on this problem, to the referee for his comments, and to 
the British Coal Utilisation Research Association for permission to publish this 
paper. 

REFERENCES 
[1] N. L. Jounson, ‘“‘Some notes on the application of sequential methods in the analysis 
of variance,”’ Ann. Math. Siat., Vol. 24, (1953), pp. 614-623. 


(2) P. G. Hogt, ‘‘On a sequential test for the general linear hypothesis,’’ Ann. Math. Siat., 
Vol. 26, (1955), pp. 136-139. 

[3] H. T. Davip anp W. H. Krusxat, “The WAGR Sequential t-Test reaches a decision 
with probability one,’”’ Ann. Math. Siat., Vol. 27, (1956), pp. 797-805. 


I 


ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Washington meeting of the Institute, March 7-9, 1957) 


1. Synchronization of Trajectory Images of Ballistic Missiles and the Timing 
Record of the Ground Telemetry Recording System, Harry P. Harr- 
KEMEIER, Stanford University, (introduced by Paul R. Rider). 


In order to compute the position, velocity, and acceleration of a missile, it is necessary 
to synchronize the image pattern from ballistic camera plate records and the timing record 
of the ground telemetry recording system. In the past this has been done by personal 
inspection. This takes too much time; consequently, a method by which the two records 
may be matched by high-speed electronic computers is required to speed up the work. 

The missile is equipped with two strobe lights, one on each side, which are supposed to 
flash simultaneously when scheduled to do so by a programmer. Inside the missile there is 
a timing generator controlled by a tape punched according to a coding pattern. When the 
timing generator sends a signal for the strobe lights to fiash, it also sends a signal simul- 
taneously to the telemetry transmitter. This signal reaches the ground recording telemetry 
system through a radio link. A method of matching these two records by using correlation 
technique and an electronic computer is presented. (Received November 6, 1956.) 


2. Maximum Likelihood Estimates in a Simple Queue, A. Bruce Ciarke, 
University of Michigan, (By Title). 


A simple stationary queueing process is a process having a Poisson input (with parameter 
d), and a negative exponential service time (with mean 1/u, » > A). Let » = the initial 





524 ABSTRACTS 


queue size, x; = the time of the ith arrival, y; = the “busy time”’ up to the ith departure. 
The sequences {z;} and {y;} then represent the transition times of independent Poisson 
processes (parameters \ and yw), and {», {zi}, {ys} together characterize the process. By 
observing the process for a fixed ‘‘busy time’’ r and using the above comment, maximum 
likelihood estimates for \ and uw may be obtained in terms of », m = the total number of 
departures, 7 = the time of the mth departure, and n = the total number of arrivals up 
to time 7. Under certain conditions these estimates of \ and » may be approximated by 
(n + v)/T and (m — v)r. (Received November 12, 1956.) 


3. A Rank Order Test for Trend in Correlated Means, Anpiz Lusin, Walter 
Reed Army Institute of Research, Washington, D. C. 


In many experiments the major interest is not in the amount of difference caused by the 
treatments but the rank-order which results. This is especially true when successive meas- 
urements are made on the same subject, and the “‘treatments”’ are simply varying amounts 
of fatigue, sleep loss, etc., i.e., some function of time. For such studies the null hypothesis 
is that no trend exists and generally the only alternative hypothesis is a rank-order that 
can be specified by the experimenter. 

A. R. Jonckheere (“A distribution-free k-sample test against ordered alternatives,’ 
Biometrika, Vol. 41 (1954)) has used Kendall’s tau to obtain a general statistic, P, for test- 
ing the agreement between a hypothesized rank-order for n objects or scores and a set of 
observed rankings of the n scores by m judges. From this general approach, he derives a 
test for trend as a special case. 

As an alternative to Jonckheere’s P, a statistic J based on Spearman’s S(d*) is examined. 
It is the sum of the S(d*) values computed between the m observed rankings of the n scores 
and the hypothesized rank-order of the n scores. K, the average rank order correlation 
between the m rankings and the hypothesized rank-order, is a simple algebraic function 
of J. 

It is shown that J is slightly more sensitive than Jonckheere’s P statistic for smal 
values of n, but that P tends to normality faster than J. (Received November 13, 1956.)1 


4. On the Stochastic Structure of Minkowski-Leontief Systems, Davip 
RosENBLATT, American University. 


A linear system z(J - A) = w is said to be of Minkowski-Leontief type if A is a finite 
nonnegative square matrix of order n with no row sum exceeding unity and z, w are non- 
negative row vectors. A non-null solution z of such a system is called admissible. Theorem: 
Every system of Minkowski-Leontief type x(I — A) = w which exhibits at least one admissible 
solution is equivalent to a unique system =(I — A) = 0, where A is a stochastic matriz de- 
pending on A and w and @ is a null vector of dimension at most n + 1. Every admissible solu- 
tion of x(I — A) = w (appropriately extended or contracted) is proportional to a convex linear 
combination of the stationary stochastic vectors of A. If A is nonstochastic, w ¥ 6, , let A 


. A b a s 
denote the matrix | . | where b = (I — A)e’, w* = \z'w, Av = we’, and e is the row 
w 


vector with all elements unity. If (J — A) exists and w =~ @, , there exists a single ergodic 
set of indices; if w is positive the stationary vector of A is positive. Clearly, 


Z = (w(I — A)-, rx) 


|A, 0 , 
If w ~ 0, and (J — A) is singular, A is taken as | oI where A, is the largest stochastic 
| n—r || 
principal submatrix of A. Systems of the present type occur in economic input-output 
analysis and generally in socio-physical models based on “balanced-margin”’ tables, i.e., 


nonnegative square matrices X such that eX = eX’. (Received November 20, 1956.) 





ABSTRACTS 525 


5. The Joint Distribution of a Set of Sufficient Statistics for the Parameters 
of a Simple Telephone Exchange Model, VActav Epvarp Bengé, Bell 
Telephone Laboratories, (By Title). 


This paper considers a simple telephone exchange model which has an infinite number 
of trunks and in which the traffic depends on two parameters, the calling-rate and the 
mean holding-time. It is desired to estimate these parameters by observing the model 
continuously during a finite interval, and noting the calling-time and hang-up time of 
each call, insofar as these times fall within the interval. It is shown that the resulting 
information may, for the purpose of this estimate, be reduced without loss to four sta- 
tistics. These statistics are the number of calls found at the start of observation, the num- 
ber of calls arriving during observation, the number of calls leaving during observation, 
and the average number of calls existing during the interval of observation. The joint 
distribution of these sufficient statistics is determined (in principle) by deriving a generat- 
ing function for it. From this generating function, the means, variances, covariances, and 
correlation coefficients are obtained. Various estimators for the parameters of the model 
are compared, and some of their distributions, means, and variances presented. (Received 
November 29, 1956.) 


6. On the Stochastic Structure of Minkowski-Leontief Systems, Il, Davip 
RosENBLATT, American University, (By Title). 


Consider a system z(J — A) = w of Minkowski-Leontief type such that (J — A) exists. 
Clearly, ({ — A) exists if and only if A contains no stochastic principal submatrix. In 
a static economic input-output context the element a;; is designated as the input (per 
unit output) to industry or activity 7 procured from industry j; w;, 2; are respectively 
final output and total cutput (or activity level) of the jth industry. Consider the uniquely 
corresponding system #(J — A) = 6, where A is stochastic. The unique stationary stochastic 
vector of A is given by (payw*(J — A), pass). The “multiplier” » = Djti z;/d is given 
by 1/pn4i , where \ = 2,4: = we’. Given a nonsingular matrix (J — A), the following rela- 
tion holds in components of an admissible solution for any w: Dj.; (1 — r;)zj — rati = 0, 
where Zn4: = we’ and r; is the jth row sum in A. The latter relation is the technical pro- 
duction-possibility function of the economy in an input-output sense; —Az;/Ax, = 
(1 — rx)/(1 — rj), Av;/Atngs = 1/11 — rj), j # ky j,k = 1, ... ,n, are the invariant “‘sub- 
stitution ratios”’ of the system, obviously independent of w. Let z be an admissible solution 
of z(I — A) = w, (I — A) singular or not, and let D(z, \) be a diagonal matrix with com- 
ponents of z and ) on the diagonal. Then D(z, \)A is a “‘balanced-margin” table. Consistent 
with a noted “substitution” result, D7: Kyo; = 2.,; = 4, where K; = 1 for all 7 inde- 
pendently of w. (Received December 17, 1956.) 


7. A Further Contribution to the Theory of Systematic Statistics, Junstro 
OGawa, University of North Carolina. 


Up to 1945 the main interest of statistical estimation has been in the “efficient esti- 
mator,’’ but from the point of view of practical use, it seems reasonable to inquire whether 
comparable results could have been obtained by a smaller expenditure. F. Mosteller (1946) 
proposed the use of systematic statistics in this connection. The author (1952) developed 
a systematic theory of estimation and testing hypotheses with respect to the location and 
scale parameter of a population whose density depends on only these two parameters. 

There are many cases in which the samples are by their very nature ordered in magni- 
tude, for example in a life test of electric lamps. In such cases the population probabilty 
distributions are usually supposed to be exponential. Thus, at least for the exponential 





526 ABSTRACTS 


distribution, estimation and testing of a hypothesis based upon systematic statistics are 
of great importance from the standpoint of practical application. 

There will be presented in this paper the table of the optimum spacings of the selected 
sample quantiles, corresponding best estimators, and a discussion on the testing procedure 
of a statistical hypothesis on the scale parameter o of the exponential distribution f(z) = 
(1/c)e—@/ for xz > 0. (Received January 7, 1957.) 


8. On the Stochastic Structure of Minkowski-Leontief Systems, HI, Davip 
RosENBLAT?T, American University, (By Title). 


Consider any system z(I — A) = w of M-L type. The following ‘‘aggregation’’ problem 
is of interest. Let an eggregation matriz C be an (n X r) stochastic matrix of incidence 
type, 1 S r < n. Let B = f(A) be a M-L matrix of order r. We consider conditions under 
which £AC = £CB obtains for admissible solutions ¢ of a system z(J — A) = w. The follow- 
ing case is of special interest. Let a weight matriz E be an (n X n) diagonal matrix with 
nonnegative entries on the principal diagonal. A consolidation of a matrix A of M-L type 
is an (r X r) matrix B = B(A;C, E) = (C’EC)"'C’EAC,1 Ss r < n. “Faithful consolida- 
tion” of a stochastic system z(J — A) = @ is characterized from the standpoint of ergodic 
structure; the condition AC = CB(A; C, E) is of particular interest. A general consolida- 
tion condition for M-L systems is related to the ‘‘combining-of-classes’’ condition of sto- 
chastic learning theory. The following is of economic interest: the existence of (J — B)~ 
does not in general imply the existence of (J — A)~, and conversely. In the static input- 
output model of II, the ergodic structure of A of the equivalent system (and the role of 
mean recurrence time 1/(pa4:)) suggest that the stationary stochastic vector g of A be 
computed iteratively using successive powers of A, yielding Z, in lieu of matrix inversion 
with or without consolidation; in most applications, lim,..A* exists. (Received January 
14, 1957.) 


9. The Use of Incomplete Block Designs for Asymmetrical Factorial Arrange- 
ments, MarvIN ZELEN, National Bureau of Standards. 


Let A, (s = 1, 2, ... , m) denote the sth factor in a m-factor factorial experiment such 
that A, has m, levels. Let i = (i, , i2, ... , im) represent a particular experimental com- 
bination of the m-factors and let the mathematical model underlying the measurements be 


m m ¢t 
yg = at z (a.);, + z zo (ast);,,+ vee (@12...m)é19...m + bj + 43, 


where (@s);,, (@st)i,,,--+» (@12...m) ;1o...., Tepresent the various main effects and inter- 
actions, b; represents the block effect, and the e;; are NID (0, o*). Algorithms are given for 
using the balanced incomplete and the group divisible designs for asymmetrical factorial 
arrangements. Let M(s) be the square matrix (of dimension M,) M(s) = m,J — J where 
J is a matrix having all elements unity, and define the direct product of p such matrices by 
M(1, 2,... , p)= [M(1) X M(2) X ... X M(p)](p S m). Then the variance-covariance 
matrix of a p-factor interaction for the G.D. case can be written as M(1, 2,... , p) o?/ 
(E,rv) (t = 1 or 2). For the BIBD, the same expression holds with EZ, = FE; . The correla- 
tions between the different interactions are all zero and since M?(1,2, ... ,p) = M(1,2,..., 
p) Il? m., (Ewr/II?m,) > (aiz...p)*i19...p followsa o*x* with II? (m, — 1) degrees of freedom 
under the hypothesis of no p-factor interaction effects. (Received January 16, 1957.) 


10. An Extension of the Cramér-Rao Inequality, Joun J. Garv, Virginia Poly- 
technic Institute, (By Title). 


Consider a frequeacy function f(z | ®) where 6 = (0, 62, ... , 6), the function being 
specified when @ is specified. The parameter 6 has a density g(@) independent of x. Let X = 





ABSTRACTS 527 


(%j, Z2,..., Zn) be a random sample from a randomly chosen population having the 
specified frequency function. Then if ¢ = [fy f(z; | 6) and t (independent of @) is an 
estimate of 6,1 < k S s, there follows a form similar to the Cramér-Rao Inequality, 
EE(\(t. — )*|@) = {E[E(t. |) — 6)}? + E*{[AE (ts | )]/004} {| E[(alng/d6,)* | 6}. The 
equality is reached if and only if t is an unbiased sufficient statistic having the normal 
distribution with constant variance. In this case the equality holds regardless of the form 
of g(@). (Received January 17, 1957.) 


11. Multivariate Analysis of Variance, 8. N. Roy, University of North Carolina. 


Consider a model under which we have stochastic variates X(p XK n) = [m ... Za]lp 
such that z,’s (fori = 1, 2,...,m) are independent N[E(z,), =], E(X’) = A(n X m) X 
E(m X p), A (to be called the design matrix) is a matrix of constants given by the design 
of the experiment, ¢ is a matrix of unknown parameters, rank (A)=rim<n,pSin-r 
and = is an unknown dispersion matrix. Under this model suppose we have a testable 
hypothesis (the meaning and mathematical criterion for testability being discussed in the 
paper) H»o:C(s X m)t(m X p)M(p X q) = 0 (s X q), where C and M (to be called the hy- 
pothesis matrices) are given such that rank (C) = s S rand rank (M) = q &S p. The al- 
ternative is H:CEM = » (s X q) (#0). The test is that at a level a we accept Ho if Cmax 
(S*S-') < c, and reject H, otherwise, where S* and S are matrices given (in the paper) in 
terms of X, A, C and M, cmax(T) denotes the largest root of a matrix with real nonnegative 
roots, and c, is a constant depending on a, min (s, q) and n — r, which we can pick up 
from tables now under construction and expected to be published shortly. (Received 
January 17, 1957.) 


12. Confidence Bounds Associated with Multivariate Analysis of Variance, 
8S. N. Roy anp R. GNANADESHIKAN, University of North Carolina. 


We start from the same set up as in the previous paper. The S* and S (to be called re- 
spectively the dispersion matrix ‘‘due to tne hypothesis’’ and the dispersion matrix ‘‘due 
to the error’’) are the exact analogs of the variance ‘‘due to the hypothesis” and that 
“due to the error’’ in the customary univariate analysis of variance. Given any level a, 
we can pick up a constant c, from the tables mentioned in the previous paper and make, 
with a probability greater than or equal to 1 — a, the confidence interval statement: 
cmax(8S*) — [sca]!!* X cmax(S) S cmaxln’Un] S cmax(8S*) + [sca}!*cmax(S), where U(s X 8) 
is a nonsingular matrix given (in the paper) in terms of A and C, and cmex(n’U7] is zero if 
and only if » = 0, i.e., Ho is true. With a joint probability greater than or equal to 1 — a 
we can also make simultaneous confidence interval statements including the one given 
above and others exactly similar to this but in terms of S“, S’, » (fori = 1,2, ... , p) 
and next in terms of S‘.4), S.*, 9%? (fori # j = 1, 2,... , p), and so on, where S“ 
and S‘®* stand respectively for truncated matrices after cutting out the ith row and ith 
column from S and S*, »‘ for » with the ith column cut out, S“-, S“.)* for S and S* 
with the ith and jth rows and columns cut out, »‘'-’) for 7 with the ith and jth columns 
cut out, and so on. (Received January 17, 1957.) 


13. Extension of Some Results Given by Mitra on “Statistical Analysis of 
Categorical Data,” Eart Diamonp, University of North Carolina. 


This is a follow-up of two previous paper (({1] ‘Some non-parametric generalizations 
of analysis of variance and multivariate analysis’? by S. N. Roy and 8S. K. Mitra, Bio- 
metrika, December, 1956, and [2] ‘Contributions to the statistical analysis of categorical 
data” by S. K. Mitra, North Carolina Institute of Statistics Mimeograph Series No. 142). 
We start from a product of multinomial distributions of the form ¢ = J] j{no;! Ip? [1 ens;!) 





528 ABSTRACTS 


with Dips; = 1, i = ivia... ins fj = jaja... je; th = 1, 2, .. » Ti; >; % = 1 2; » Te5 
ji © (81)jq---5¢ (a subset of s,; depending on the subscript set jz... je); j2 € (82)i3--+5¢5 --- 5 
je-re (8¢-1);, and je = 1,2, ... , 8¢ . Wenext consider two hypotheses HS? :pi; = fe? (A, --- 

6:,) subject to g(@.,... , @,) = 0 (m= 1,2,... , wu: < th) and Hi?:p;; = fiP (Hi, ... , 
615) subject to 9 (6, Seatg 612) = 0 (m = 1,2,... , we < te), where t, , tg < total number 
of cells — total number of multinomial distributions. Each hypothesis is a composite one 
in which the 6’s or 6’’s are the nuisance parameters and f{}, gS, ff and g are known 
functions. Tests are taken over from Refs. [1] and [2], and the asymptotic powers of the 
tests and the conditions for asymptotic independence are derived which are extensions of 


similar conditions for more special cases discussed in [2]. (Received January 17, 1957.) 


14. Testing of Hypotheses on a Mixture of Variates Some of Which are Con- 
tinuous and the Rest Categorical, S. N. Roy anp M. D. Movustara, 
University of North Carolina. 


We start from a k + f¢-variate distribution in which & variates are continuous and ¢ 
variates are categorical. The k variates are assumed to have a conditional multivariate 
normal distribution with respect to the £ categorical variates which are assumed to have 
a multinomial distribution. Appropriate hypotheses are framed in this situation, analogous 
to the customary hypotheses on a single multivariate normal distribution (or to those 
in Refs. [1] and [2] of the previous abstract), large sample tests of such hypotheses are de- 
veloped and some of their properties studied. Next, instead of assuming a single multi- 
nomial distribution on the ¢ categorical variates, a product of multinomial distributions 
is assumed and hypotheses are framed in this situation analogous to the customary ones 
for several multivariate normal distributions or to those in Refs. [1] and [2], and large 
sample tests of such hypotheses and some of their properties are studied. (Received Janu- 
ary 17, 1957.) 


15. On Statistics Independent of a Sufficient Statistic, Evan J. WriuraMs, 
North Carolina State College. 


It is shown that if, for a sample drawn from a population of values of z with distribu- 
tion depending on a parameter 6, the statistic z is sufficient for 6, and g is any statistic 
whose distribution is independent of 6, then g and z are independently distributed. The 
method of proof is less sophisticated than that of Basu (Sankhyd, Vol. 15 (1955), p. 377). 

The result has application to the normal distribution: the mean of a sample is distributed 
independently of any location-free statistic; and to the gamma distribution: the mean of 
a sample is distributed independently of any scale-free statistic. These well-known re- 
sults follow since the sample mean is a sufficient statistic, in the former case for the loca- 
tion parameter, in the latter case for the scale parameter. 

The limitations of the general result lie in the difficulty of deriving statistics independent 
of parameters other than location and scale parameters. 


The connexion of the theorem with estimation theory is discussed. (Received January 
17, 1957.) 


16. Generalized Quantal Response in Biological Assay, JouNn GuRLAND, lowa 
State College. 


The quantal (all-or-none) response in biological assay refers to a response in which one 
of two possible outcomes occurs. In a bioassay such as that of an insecticide based on 
mortality of the housefly, say, there are, however, three possible outcomes, namely, alive, 
moribund, dead. The present paper considers a generalized quantal response in which 





ABSTRACTS 529 


two or more outcomes are possible. Whether one uses normits (cf. probits) or logits or 
other transformations, a general method of analyzing the data is developed which makes 
explicit use of all the possible outcomes and hence is more efficient than the common pro- 
cedure of pooling some outcomes (for example, moribund and dead) in order to make the 
response all-or-none. Further, a technique analogous to that used in discriminant functions 
is suggested as a method which makes more efficient use of the data than the pooling method 
mentioned above. (Received January 21, 1957.) 


17. The Variance of Zero-Crossing Intervals, J. A. McFappen, U. 8. Naval 
Ordnance Laboratory, (introduced by Gilbert Lieberman). 


Two expressions are given for the variance of the intervals between successive zeros 
of a random process. It is assumed that the successive intervals form a Markoff chain. 
If z(t) is a random process, let y(t) = 1 when z(t) = 0 and y(t) = —1 when z(t) < 0. Let 
8 be the expected number of zeros per second and let « be the correlation coefficient be- 
tween two successive zero-crossing intervals. Then the variance is o? = (2A/8)(1 + «x)/ 
(1 — «), or alternatively, o? = [(1 + 2B)/#@] (1 — «)/(1 + «), where A = is r(r) dr and 
B= fc [Q(r) — 8] dr. r(r) is the autocorrelation function of the process y(t) and Q(r) dr 
is the conditional probability of a zero between t + 7 and t + r + dr, given a zero at time 
t. (Received January 21, 1957.) 


18. A Limit Theorem and Bounds for an Optional Stopping Probability, Morris 
Sxrsinsky, Michigan State University, (By Title). 


Let S; be the standardized jth partial sum of a sequence of bounded independent, identi- 
cally distributed random variables, K, a positive constant, and let 


Q(m, n, K) = Pr{max,,s;<, 8; 2 K}. 


It is shown by elementary methods that if lim,_...[(n — K)/m'*] = 0, then lim... 
Q(m, n, K) = 1 — ¢(K), where ¢ is the standard normal c.d.f. Certain steps in the 
proof are then used to obtain simple bounds for Q(m, n, K) when the sequence of random 
variables is generated from Bernouli trials. (Received January 21, 1957.) 


19. A Limit Theorem of Cramér and Its Generalization, Junyrro OGawa, 
University of North Carolina, (By Title). 


As a generalization of Doob’s theorem, H. Cramér states the following theorem: Suppose 
we have for every vy = 1,2, ... , ys = Az, + z,, where xz, , y, and z, are n-dimensional random 
variables, while A is a matriz of order (n X n) with constant elements. Suppose further that 
as y — ©, the n-dimensional distribution of x, tends to a certain limiting distribution, while 
z, converges in probability to zero. Theny, has the limiting distribution defined by the linear 
transformation y = Az, where x has the limiting distribution of the z, (H. Cramér, Mathe- 
matical Methods of Statistics, Princeton, 1946, pp. 299-300). Cramér skips the proof of this 
theorem. In this paper, the complete proof of this theorem will be given and two theorems 
which are generalizations of this theorem and are useful in statistics will be proved. (Re- 
ceived January 22, 1957.) 


20. On the Mathematical Principles Underlying the Theory of the x* Test, 
Junstro OGAawa, University of North Carolina, (By Title). 


The rigorous proof of the theorem that the x? statistic has the limiting chi-square dis- 
tribution with degrees of freedom reduced by the number of the independent parameters 





530 ABSTRACTS 


which were estimated from the frequency data, was first given by H. Cramér in his famous 
book Mathematical Methods of Statistics, Princeton (1946), but some steps of the proof 
were skipped. Later on S. N. Roy and S. K. Mitra (Biometrika, Vol. 43 (1956)) and S. K. 
Mitra (Thesis, University of North Carolina, 1956) reasoned along the same lines and 
got theorems adjusted to various physical situations. The purposes of this paper are to 
present a complete and self-contained proof of Cramér’s theorem on the one hand, and on 
the other to explain how the proof of the related theorems got by S. N. Roy and S. K. 
Mitra could be thrown back on that of Cramér’s theorem from the mathematical point of 
view. (Received January 22, 1957.) 


21. Minimization of Certain Integrals Subject to Linear Constraints, (Pre- 


liminary Report), C. H. Krarr anp I. Oxxin, Michigan State Uni- 
versity. 


Let F be the class of measures f such that Eyq;(x) = a; ,i = 1,...,n and EyH(f) < @. 
The problem of minimizing Z;H(f) over F has been treated by Shannon [Bell System Tech- 
nical Journal, Vol. 27 (1948), pp. 623-656] for H(f) = log f, q(x) = 2z* using calculus of 
variations, and by Weiss [Ann. Math. Stat., Vol. 27 (1956), pp. 851-853] for H(f) = f, arbi- 
trary square integrable q;(z). 

The following considerations apply to these cases as well as others, e.g. H(f) = f?. 
An inequality of the form E,H(f) = T(f, g) for all densities g is available, where T(g, g) = 
E,H (g). T(f, g) is constant for f ¢ F if and only if g(z) = 2b,qi(z). The bound is attainable 
if the constants b; can be chosen so that g e F. These considerations extend the proofs to 
not necessarily dominated families F on any measure space. (Received January 23, 1957.) 


22. The Recovery of Intervariety Information, BrapLey Bucuer, Princeton 
University. 


Assume, in the incomplete block model, ys; = m + b; + 0; + e;; , that the block effects 
are independently distributed with mean 0 and variance 6*, the error terms e;; are inde- 
pendently distributed with mean 0 and variance a?, and that the variety effects t, , ... , tk, 
are fixed effects and that ti,,,... , tn, are independently distributed with mean 0 and 
variance 7*. Then in estimating any linear combination of the variety effects, say, ait: + 
GQote + ... + ate , we may make use of information among the varieties try, ... , tn. 
Minimum variance linear unbiased estimates are obtained for such combinations for a 
large class of incomplete block designs. In general, these estimates have smaller variance 
than analogous estimates obtained using only inter- and intra-block recovery. For balanced 
incomplete blocks the estimate with intervariety recovery is shown to be the same as 
the combined intra- and inter-block estimate. Several techniques are developed which 
are useful for finding estimates using intervariety recovery. The problem of estimating 
y? is discussed. Useful applications of the technique of intervariety recovery are considered. 
(Received January 24, 1957.) 


23. Some Uses of Quasi-Ranges II, J. T. Cou anp F. C. Leonp; Case Institute 
of Technology anp C. W. Topp, Fenn College, Cleveland, Ohio. 


In ‘‘Some uses of quasi-ranges,’’ (Ann. Math. Stat., Vol. 28 (1957), No. 1), methods are 
given of using quasi-ranges to obtain confidence imtervals for, and tests of hypotheses 
about, some measures of dispersion of a given distribution (such as the interquantile 
distance and the standard deviation). In this paper, further research is done on the selec- 
tion of quasi-ranges for making inferences about the standard deviations of the normal, 
rectangular, and exponential distributions. The methods are also extended to the co- 





ABSTRACTS 531 


efficient of variation, the difference and ratio of interquantile distances and standard 
deviations of two given distributions, ete. Tables are given to facilitate applications. 
Received January 24, 1957.) 


24. On Selecting a Subset Which Contains All Populations Better Than a 
Standard, SHant1 8S. Gupta anpD Miton Soset, Bell Telephone Labora- 
tories. 


Populations z;(i = 0,1, ... , p) are given with a common Koopman-Darmois distribu- 
tion of known form differing only in the value of the unknown parameter 7;(i = 1,2, ... , p); 
cases of known and unknown (associated with the standard x) are treated separately. 
Location and scale parameter problems are both treated. In some problems 7z; is defined 
as better than mo if 7; > ro ; in others if r; < 79. A procedure is given in each case for 
selecting a small subset so that, for any true configuration, the probability of including 
all +; equal to or better than 7» is at least P*, P* < 1 being preassigned. For the location 
parameter, with r> unknown, the procedure is to retain all x; with ® = Djt: w(z:;) = 
i — d/(n;)''?; here ®; is sufficient for 7;(i = 0,1, ... , p). For scale parameter problems, 
with smaller +; more preferable, the procedure retains all x; with Dj4: w(zi;) S (1 + d) 
2Dj*: w(z0;). In several proklems the value of d is computed and tables are given for dif- 
ferent P* and p-values; in others transformations are used to ‘‘normalize’”’ the problem. 
The normal and chi-square distributions are used as applications. Problems involving 
binomial and Poisson distributions are treated separately with and without normalizing 
transformations. (Received January 24, 1957.) 


25. On the Relation Between Loss Functions and Significance Levels, (Pre- 
liminary Report), H. Roperrt van per Vaart, North Carolina State 
College and Leiden University. 


Consider a one-parameter family {Ps} of probability distributions. Be it asked to test 
Ho:@ = 6 , against H,:@ > @ . Define a ioss function L = ly if Ho is rejected when true, 
L = l, if H; is rejected when true, L = 0 otherwise. Suppose a family {w:} of subsets of 
the sample space is given, Ps(w;) being a monotonous increasing function of @ for each wu; . 
Then selecting a critical region w, such that P»(w,) has some fixed value a is a classical 
procedure, known to be minimax relatively to L provided a = 1,/(lo + ,) (Sverdrup, 1953; 
Ruist, 1954). However most statisticians, while fixing Ps(w.) = a for 9 = @ , really want(ed) 
to reject Hy only if 6 differs materially from % , say if @ > 6; > 6 (cf. also Hodges and 
Lehmann, 1954), i.e. they test(ed) Ho:8 < , against Hi:6 > 6; (@ acting as an idealiza- 
tion of Ho). Now the critical region w, which is minimax in the situation described by 
adding a prime to each H in the definition of L has two properties: (i) Pe.(w.) < a = 
L:/(lo + 11), depending on 6, , (ii) Ps,(w.’) is smaller with more powerful test families {w,}. 
Both effects (subsisting with loss functions allowing for indifference zones) indicate that 
fixing Ps(w,) upon a constant level for such ‘‘idealized null hypotheses” as 6 = 6) may 
be a questionable procedure. (Received January 28, 1957.) 


26. A Note on Fluctuations of Telephone Traffic, VActav Epvarp Benss, 
Bell Telephone Laboratories, (By Title). 


Let N(t) be defined as the number of calls in progress in a simple telephone exchange 
model characterized by unlimited call capacity, a general probability density of holding- 
time, and randomly arriving calls. A formula, due to Riordan, for the generating function 
of the transition probabilities of N(t) is proved. From the generating function, expressions 
for the covariance function of N(t) and for the spectral density of N(t) are determined. 





532 ABSTRACTS 


It is noted that the distributions of N(¢) are completely specified by the covariance func- 
tion, if N(t) is defined as above. (Received February 4, 1957.) 


27. Randomization Procedures for the Estimation of Cross-Spectral Density 
Functions, A. E. Garratt, Virginia Polytechnic Institute, (By Title). 


The cross-spectral density function may be estimated by 
@2y(wn) = Denil e(tudy(tu + kudt)Gi(ky) + iX (tu)y (tu + muAt)G2(m,)} 


where the k, are independently distributed according to pi(k), k = —r, ... , r; where the 
m, are similarly distributed according to p2(m) and are independent of the ky; and where 
Gi(ku) and G2(m,) are arbitrary weight functions. 

It is shown that the expectation of the estimator depends on the products p:(k)G,(k) 
and p2(m)G2(m), whereas the variance of the estimator depends specifically on p(k) and 
p2(m). Various specifications of the products pi(k)Gi(k) and p2(m)G2(m) and of the proba- 
bility distributions p,(k) and p2(m) are considered which provide estimators with certain 
optimum properties. (Received February 8, 1957.) 


28. Fréchet Differentiable Functional Estimates, GopinarH KALLIANPUR, 
Michigan State University, (introduced by Morris Skibinsky). 


Suppose f(z) is a probability density over the finite range (a, b) which is independent of 
the unknown @ to be estimated. Let ¢,(z) denote an empirical density function (defined in 
the paper) of a sample of size n from the given population. Let G be a class of functionals 
over the Banach space L, satisfying the following conditions: (i) G possesses Fréchet differ- 
entials of the first two orders at the ‘‘true point’”’ fy . If gilfe ; z] and gz[fe; 2, y] are the 
Fréchet derivatives of the first and second order at f¢ , (ii) g:lfs ; x] is a continuous function 
of x which is not zero over a set of positive measure, (iii) | goife ; z , y] | S A < ©, A being 
independent of z and y . (iv) G[fs(z)] = 6 (‘Fisher consistency’’). Then assuming regularity 
conditions which validate differentiation with respect to @, etc., and assuming 
Es(gilfe ;z]) = 0 without loss of generality, it is shown that ~/n{G[¢.] — 6} is asymptotically 
normally distributed with zero mean and asymptotic variance E,(gilfs ; z]) which satisfies 
Fisher’s inequality Eo(gi{fs ; z]) = {Ee{(@ log fe) /a0}*}— . An earlier paper by C. R. Rao and 
the author (Sankhyd, 1955) discusses similar problems for functionals of the empirical c.d.f. 
(Work done under ONR project at Columbia University.) (Received February 14, 1957.) 


29. The Efficiency of Nonparametric Tests, Gorrrriep E. Norrner, Boston 
University. 


Given two tests of the same hypothesis and the same significance level. If for the same 
power with respect to the same alternative one requires a sample of size nm, and the other a 
sample of size n2 , the relative efficiency of the second test with respect to the first test is 
given by the ratio n:/n2. The paper surveys existing results on the relative efficiency of 
important nonparametric tests with respect to corresponding parametric as well as other 
nonparametric test procedures. In particular, the following problems are considered: 
one-sample and paired comparison tests, two-sample tests, analysis of variance tests, tests 
of independence and regression, goodness of fit tests. As a general conclusion, it can be said 
that the employment of the more efficient nonparametric methods instead of the customary 
parametric methods rarely involves an appreciable loss of information, but may lead to a 
considerable gain. (Received March 1, 1957.) 





NEWS AND NOTICES 533 


30. On a Problem in Abelian Groups and the Construction of Fractionally 
Replicated Designs, R. C. Bosr, University of North Carolina anp R. C. 
Burton, National Bureau of Standards. 


Consider an Abelian group of order s", generated by n letters A,;, A2,..., An, with 
the relations Aj = A; = ... = A‘, = I, where I is the identity and s is a prime. If G = 
Aj' A} ... A is any element of the group, then the number of non-zero exponents z; may 
be called the length of G. Given an integer r < n, the problem is to find a subgroup of order 
8’, generated by r independent elements G; = Aj‘!A}*? ... Ax‘ such that the minimum 
length of the elements in the subgroup (except the length of the unit element) is greater 
than or equal to k. Consider the finite projective space PG(r — 1, s). To any point z = (x, 
I2,...,2%,r) of this space, assign a non-negative integer m, which may be considered the 
measure of z, in such a way that the total measure for the space is n. To a point of measure 
m associate m different letters chosen out of A; , Az, ... , An , each of these letters being 
assigned to one and only one point. Let G; = Aj‘! Aj‘? ... An‘ where 2;; is the ith coordi- 
nate of the point to which A; is associated. It is proved that the length of the element 
G}'GY ... Gi" is the measure of the set of points not lying on the linear space \i:2; + A2t2 + 
... +2, = 0. For example let n = 10, r = 4, 8 = 3. We can find exactly 10 points on an 
unruled quadric in PG(3, 3). If we take the corresponding subgroup as the fundamental 
identity for generating a }‘ fraction in a factorial design with 10 factors, then all the aliases 
of a main effect will have five or more factors, and all the aliases of two factor interaction 
will have four or more factors. (Received January 21, 1957.) 


ee 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Professor Felix Bernstein, the founder and director Emeritus of the Institute 
of Mathematical Statistics, University of Goettingen, Germany, died December 
3, 1956 in Zuerich, Switzerland. Professor Bernstein was also a member of the 
International Statistics Institute, a fellow of the Royal Statistics Society, a 
fellow of the AAAS, and was professor of biometrics, New York University from 
1936-1945. In 1950 he was American Fulbright professor at the Institute of 
Statistics, Rome, Italy. 

Dr. Robert M. Blumenthal has been appointed to an instructorship at the 
University of Washington. 

Glenn L. Burrows has been appointed Staff Statistician at the Knolls Atomic 
Power Laboratory, Schenectady, New York. 

Victor Chew resigned on February 1, 1957 from the position of Assistant Pro- 
fessor of Statistics, University of Florida, to become Asst. Statistician at Insti- 
tute of Statistics at Raleigh, North Carolina, and do work towards a Ph.D. in 
experimental statistics. 

Professor Kai Lai Chung, on leave from Syracuse University, is a Visiting 
Professor at the University of Chicago during 1956-57. 

George E. Ferris is now with the Statistics Department of General Foods’ 
Corporation in Hoboken, New Jersey. 





534 NEWS AND NOTICES 


Alfred E. Garratt is now Assistant Professor of Statistics at Virginia Poly- 
technic Institute. 

Ferdinand Lemus, formerly with the Experimental Design and Statistical 
Analysis Group of Westinghouse Electric Corp., East Pittsburgh, Pa., is now 
employed as a Statistical Engineer with the Avon Lake Experimental Station, 
B. F. Goodrich Chemical Company, Avon Lake, Ohio. 

Gisiro Maruyama has been appointed Professor at Kyusyu University, Fac- 
ulty of Science, Fukuoka, Japan. 

W. Jay Merrill, Jr., received his Ph.D. from Ohio State University in De- 
cember, 1956. 

Robert Mirsky, who for the past six years has been with the Cornell Aero- 
nautical Laboratory, has joined the General Electric Company as an operations 
research analyst in their Engineering Operation at 3198 Chestnut Street, Phila- 
delphia, Pa. 

Dr. Mervin E. Muller is on leave of absence from the Scientific Computing 
Center, International Business Machines Corp., New York, in order to accept 
a position as Research Associate in Mathematics, Department of Mathematics, 
Princeton University. 

A. Carl Nelson, Jr., formerly an instructor in mathematics at the University 
of Delaware, Newark, Delaware, is now a statistician for Westinghouse Atomic 
Power Division, Pittsburgh, Pa. 

Joseph 8. Rhodes has been appointed manager of operations research of Atlas 
Powder Company, a new post in its Economic Evaluation Department. 

Ronald W. Shephard has been appointed a professor in the Engineering De- 
partment, University of California, Berkeley, California. 

George W. Snedecor has gone to North Carolina State College as a visiting 
professor in the Institute of Statistics, stationed with the Department of Ex- 
perimental Statistics from January until September 1957. He is engaged in con- 
sulting and teaching statistical methods. 

Dan Teichroew has been transferred by the National Cash Register Company 
to its Electronics Division in Hawthorne, California, to form a Business Systems 
Analysis Section in the Product Development Department. 

Dr. Alan E. Treloar is currently on leave from the University of Minnesota 
and serving as Director of Research for the American Hospital Association in 
Chicago. 

Frank H. Trinkl has joined The Ramo-Wooldridge Corporation as a Member 
of the Technical Staff, of the Computer Systems Division. 

William H. Williams has been chosen to receive The George W. Snedecor 
Award in Statistics for 1957, by vote of the graduate faculty in statistics at Iowa 
State College. The award is given annually to the person judged to be most out- 
standing among those students at the college working toward a Ph.D. or joint 
Ph.D. in statistics who are expected to graduate within a specific time interval; 
it consists of a year’s membership in the Institute of Mathematical Statistics 
together with a subscription to its Annals. 





NEWS AND NOTICES 535 


Mrs. Pearl A. Van Natta has, since September 25, 1956, been employed by 
The Denver Research Institute, Division of Physics, as a Research Mathe- 
matician. 


a 


Use of High Speed Computers by IMS Members* 


A questionnaire was mailed to all members of the IMS in 1956 to determine 
the present status of the use of high speed computers by the membership and to 
ascertain the kind of information desired regarding these machines. Only 263 
members answered the questionnaire: 119 from colleges and universities, 90 from 
industry and private consulting and 54 from government. As a result of the 
interest in having sessions on these computers at meetings of the IMS, sessions 
were cosponsored at both the Seattle and Detroit meetings in 1956, one session 
was held at the Washington, D. C. meeting in 1957 and sessions are being 
planned for the Atlantic City meeting. The members seemed to be about equally 
divided as to topics for papers at these meetings. 

Eighty-seven per cent (87%) of those replying favored an expository article 
on the subject for the Annals. The content of such an article should emphasize 
a glossary of terms useful to statisticians, description of major machines and 
how they differ and examples of the use of the machines for statistical purposes. 
Only 53% favored a regular section on computing and 23% opposed such a 
section, 24% being indifferent. Information regarding these machines can be 
obtained from other journals, but many of the members were unfamiliar with 
either the Association for Computing Machinery (54% of those replying) and 
Mathematical Tables and other Aids to Computation (40% of those replying). 
Of those replying, only 55% had used these computers; of those who had used 
the computers, 56% had done some programming. 

Of those who had used a high speed computer, 20% had used it for empirical 
sampling; 24 % for Monte Carlo; 36% for data reduction; 36 % for table prepa- 
ration; 54% for mathematical problems; and, 62% for data analysis. Forty- 
eight per cent (48%) had used it for analysis of variance or regression and cor- 
relation analysis. Other types of analysis mentioned were matrix inversion or 
multiplication, computation of means and standard deviations, time series and 
spectral analysis, multivariate analysis including discriminant functions, factor 
analysis, contingency tables and x?, linear programming and inventory control, 
solution ef equations, numerical integration, solutions of econometric models, 
tests of significance, distribution theory, complicated confidence limits, differen- 
tial and integral equations, fitting frequency curves, Boolean algebraic equa- 
tions, actuarial formulas, non-linear equations, acceptance-rejection methods, 
stress analysis, maxima of functions and control charts. The committee was 
impressed by the infrequent use of these machines for solving problems in mathe- 
matical statistics and econometrics. 


* Prepared by R. L. Anderson, Chairman of IMS Committee on High Speed Machines. 





NEWS AND NOTICES 


University of Michigan Program in Mathematical 
Statistics and Probability 


The University of Michigan provides training in mathematical statistics and 
probability, (a) leading to the master’s degree, (b) leading to the doctorate. 
Courses in mathematical statistics and probability are conducted by staff mem- 
bers (including H. C. Carver, A. B. Clarke, A. H. Copeland, C. C. Craig, D. A. 
Darling, P. S. Dwyer, J. G. Wendel, Oscar Wesler). Teaching fellowships and 
assistantships are also available. Applications for these should be sent to Pro- 
fessor T. H. Hildebrandt of the Department of Mathematics. Additional in- 
formation may be obtained by writing Professor C. C. Craig or Professor P. 8. 
Dwyer of the Department of Mathematics. 


See 


Summer Statistical Seminar 


A two-week summer statistical seminar will be held at the Endicott House 
in Dedham, Massachusetts, beginning July 29, 1957. The first week will be 
devoted to the general topic of time series with emphasis on turbulence, aero- 
nautics, ship motion, and communication, under the chairmanship of Leo Tick 
of New York University. 

The program for the second week will include business applications, reliability, 
and data reduction topics, as part of the general discussion of the impact of 


computers on statistical problems. Dr. Max Woodbury is chairman of this pro- 
gram. 

Further information can be obtained from Dr. M. E. Terry, Bell Telephone 
Laboratories, Murray Hill, New Jersey, or the secretary, Dr. Geoffrey Beall, 
Gillette Safety Razor Company, Boston, Massachusetts. 


I 
University of Chicago Department of Statistics 


The department of statistics at the University of Chicago, known since its 
organization in 1949 as the Committee on Statistics, is now called the Depart- 
ment of Statistics. The name was changed from Committee to Department in 
orderto avoid confusion about the nature and status of the organization. Leonard 
J. Savage, who has been Acting Chairman of the Department this year, has 
accepted a regular appointment as Chairman beginning March 1, 1957. He 
succeeds W. Allen Wallis, who now is Dean of the School of Business though he 
continues as a member of this Department. Other members of the Statistics 
faculty are K. A. Brownlee, Kai Lai Chung, Sudhish G. Ghurye, Leo A. Good- 
man, William Kruskal, John W. Pratt, Harry V. Roberts, and David L. Wallace. 





NEWS AND NOTICES 


Assistance for Travel to International Congress of Mathematicians 


Funds will be made available by the National Science Foundation, aided 
by grants from industry, to provide travel assistance for a limited number of 
mathematicians attending the International Congress of Mathematicians in 
Edinburgh, August 14—21, 1958. Grants will be made on the basis of lists prepared 
by the various mathematical societies. Anyone who wishes to apply through 
the Institute of Mathematical Statistics should notify its Secretary (see inside 
front cover for address) not later than October 1, giving the following informa- 
tion. (1) His address; (2) a statement whether he intends to present a paper and 
if so, whether this is by invitation; if possible the subject or title of the paper 
should be given; (3) a description of other travel funds, if any, available to him. 
The applicant should also write directly to the National Science Foundation, 
Washington 25, D. C., requesting an application form for foreign travel grants, 
and should return the completed form to the National Science Foundation. 


_ 


DOCTORAL DISSERTATIONS IN STATISTICS, 1956 


Listed below are the doctorates conferred during the year 1956 in the United 
States and Canada for which the dissertations were written on topics in sta- 
tistics or related fields. The university, major subject, and the title of the dis- 
sertation are given in each case. Readers are invited to notify the Editor of any 


omissions from this list. 

John L. Bagg, Michigan State University, major in mathematical statistics, 
“A Probability Model for Theory of Organization of Groups with Multi-valued 
Relations between Persons.”’ 

Anatole Beck, Yale, major in mathematics, ‘On the Random Ergodic 
Theorem.” 

Robert Blumenthal, Cornell, major in probability, “An Extended Markov 
Property.” 

R. V. 8. Chacon, Syracuse, major in mathematics, “Some Theorems on Con- 
tinuous Parameter Markov Chains.” 

Richard Garth Cornell, Virginia Polytechnic Institute, major in statistics, 
‘“‘A New Estimation Procedure for Linear Combinations of Exponentials.”’ 

Arthur P. Dempster, Princeton, major in statistics, “The Two-Sample Multi- 
variate Problem in the Degenerate Case.”’ 

B. J. Derwort, St. Louis, major in statistics, “An Extension of the Theory of 
Cumulative Frequency Functions.” 

Olive Jean Dunn, California (Los Angeles), major in statistics, “Estimation 
Problems for Dependent Regression.” 

Sylvain Ehrenfeld, Columbia, major in mathematical statistics, “(Complete 
Class Theorems in Design of Experiments; Part I: Complete Class Theorems 





538 NEWS AND NOTICES 


in Experimental Design; Part II: On the Efficiency of Experimental Design.” 

Alvin Vincent Fend, Illinois, major in statistics, ‘(Unbiased Estimation and 
Admissibility and the Treatment of Ties in the Sign Test.” 

Thomas Shelburne Ferguson, California (Berkeley), major in statistics, Part 
I: “On the Existence of Linear Regression in Linear Structural Relations;’ 
Part II: “A Method of Generating Best Asymptotically Normal Estimates 
with Application to the Estimation of Bacterial Densities.” 

Aubyn Freed, Illinois, major in mathematics, ‘‘On the Ergodic Theorem in 
Dynamical Systems with Variant Measure.” 

Donald A. Gardiner, North Carolina State College, major in statistics, “Some 
Third Order Rotatable Designs.”’ 

Donald P. Gaver, Princeton, major in probability, “Some Results in the 
Theory of Queques.” 

David G. Gosslee, North Carolina State College, major in statistics, ““The 
Level of Significance and Power of the Unweighted Means’ Test.”’ 

Shanti Swarup Gupta, North Carolina, major in statistics, “On a Decision 
Rule for a Problem in Ranking Means.” 

Bertram W. Haines, Johns Hopkins, major in biostatistics, “Some Procedures 
of Selecting Records for Retirement.” 

Lester LaVerne Helms, Purdue, major in mathematical statistics, ‘““Con- 
vergence Properties of Martingales Indexed by Directed Sets.” 

G. Ronald Herd, Iowa State College, major in statistics, “Estimation of the 
Parameters of a Population from a Multi-Censored Sample.” 

William Gerow Howe, North Carolina, major in statistics, “Some Contribu- 
tions to Factor Analysis.” 

Milton V. Johns, Columbia, major in mathematical statistics, “Contributions 
to the Theory of Empirical Bayes Procedures in Statistics.” 

Eugene Arthur Johnson, Minnesota, major in biostatistics, “On the Problems 
of Errors Associated with Linear Regression.” 

Marvin A. Kastenbaum, North Carolina State College, major in statistics, 
‘Analysis of Frequency Data in Multiway Contingency Tables.” 

Therese M. Kelleher, North Carolina State College, major in statistics, 
“Analysis and Interpretation of Variation in Inbred Lines and F, Crosses in 
Corn.” 

A. R. Khalil, North Carolina State College, major in statistics, ‘‘Joint Inter- 
pretation of Heterosis and Genetic Variance in Two Corn Varieties and Their 
Crosses.” 

Orval M. Klose, Washington, major in statistics, ‘‘Topics in Distribution- 
Free Statistics.” 

Clyde Young Kramer, Virginia Polytechnic Institute, major in statistics, 
“Factorial Treatments in Incomplete Block Designs.”’ 

Thomas E. Kurtz, Princeton, major in statistics, ‘‘An Extension of a Multiple 
Comparisons Procedure.” 

E. J. Lytle, Jr., Florida, major in statistics, ‘‘The Determination of Some 
Distributions for Which the Mid-range is an Efficient Estimator of the Mean.” 





NEWS AND NOTICES 539 


John Hans MacKay, North Carolina, major in statistics, “On the Efficiency 
of Certain Tests for 2 x 2 Tables.” 

Thomas A. Magness, California (Los Angeles), major in statistics, “The Use 
of Cumulants in the Theory and Applications of Stochastic Processes.”’ 

Carl E. Marshall, Iowa State College, major in statistics, “Cost Control of 
Sample Surveys by Two-Step Designs.” 

Samuel T. Mayo, Minnesota, major in educational psychology, “Some De- 
signs for the Collection of and Methods for the Analysis of Enumerative Data, 
with Special Applications to the Follow-up of Education Graduates.” | 

F. 8S. McFeely, Virginia Polytechnic Institute, major in statistics, ‘Decision 
Procedures for the Comparison of Exponential and Geometric Populations.” 

Dale M. Mesner, Michigan State University, major in mathematical sta- 
tistics, “An Investigation of Certain Combinatorial Properties of Partially 
Balanced Incomplete Block Experimental Designs and Association Schemes, 
with a Detailed Study of Designs of Latin Square and Related Types.” 

Irwin Miller, Virginia Polytechnic Institute, major in statistics, ‘“Tests of 
Hypotheses Involving Desirability Relations and Some Distribution Theory 
Connected with Gaussian Processes.” 

Sujit Kumar Mitra, North Carolina, major in statistics, ‘‘Contributions to 
the Statistical Analysis of Categorical Data.” 

Gilbert I. Paul, North Carolina State College, major in statistics, “A Method 
of Estimating Epistatic Variance in Random Mating Populations.” 

W. E. Perrault, St. Louis, major in statistics, ‘“Contribution-free Population 
Comparisons.”’ 

John Winsor Pratt, Stanford, major in statistics, “Some Results in the De- 
cision Theory of One-Parameter Multivariate Polya Type Distributions.” 

Ronald Pyke, Washington, major in statistics, “On One-Sided Distribution- 
Free Statistics.” 

Roy Radner, Chicago, major in statistics, ““Team Decision Problems.” 

Mushfequr Rahman, McGill, major in mathematics, ‘‘A statistical Problem 
in the Geometry of Numbers (Star-shaped Domains of Quadratic and Hexagonal 
Symmetry).” 

W. L. Roach, Jr., Oregon, major in statistics, “The Application of the Ex- 
ponential Distribution to a Truncated Stochastic Process.” 

Joan Raup Rosenblatt, North Carolina, major in statistics, “On a Class of 
Non-parametric Tests.” 

Anadi Ranjan Roy, Stanford, major in statistics, “On Chi Square Statistics 
with Variable Intervals.” 

Thomas Solon Russell, Virginia Polytechnic Institute, major in statistics, 
“Estimation of Individual Variations in an Unreplicated Two-Way Classi- 
fication.” 

Jagdish Sharan Rustagi, Stanford, major in statistics, “(On Minimizing and 
Maximizing a Certain Integral with Statistical Applications.” 

Jerome Sacks, Cornell, major in statistics, “Asymptotic Distribution of 
Stochastic Approximation Procedures.” 





540 NEWS AND NOTICES 


Sam Cundiff Saunders, Washington, major in statistics, ‘Sequential and 
Randomized Distribution-Free Tolerance Limits.” 
Daniel E. W. Schumann, Virginia Polytechnic Institute, major in statistics, 


“The Comparison of the Sensitivities of Experiments Using Different Scales of 
Measurement.” 


Arthur Shapiro, California (Berkeley), major in statistics, ‘Some Conditions 
for the Existence of Similar Regions.” 

Robert Tynes Smith III, George Washington, major in mathematical sta- 
tistics, ‘A Stochastic Model for Economic Time Series.” 

Andrew Sterrett, Pittsburgh, major in statistics, ‘‘An Efficient Method for the 
Detection of Defective Members of Large Populations.” 

Hale Caterson Sweeny, Virginia Polytechnic Institute, major in statistics, 


“Some Results on Experimental Designs When the Usual Assumptions are 
Invalid.” 


Maurice M. Tatsuoka, Harvard, major in educational measurements, “Joint 
Probability of Membership and Success in a Group: An Index which Combines 
the Information from Discriminant and Regression Analyses as Applied to the 
Guidance Problem.” 

James G. C. Templeton, Princeton, major in statistics, “A Test for Detecting 
Single Cell Disturbances in Contingency Tables.” 


Aram Thomasian, California (Berkeley), major in statistics, “On the Magni- 
tude of the Sum of Error Probabilities.” 


Hale F. Trotter, Princeton, major in probability, “(Convergence of Semi- 
Groups of Operators.” 

Donald Robert Truax, Stanford, major in statistics, ‘‘Multi-Decision Prob- 
lems for the Multivariate Exponential Family.” 

Henry Tucker, North Carolina State College, major in statistics, “Sampling 
for Agricultural Price Statistics.” 

Irving Weiss, Stanford, major in statistics, “Limiting Distributions in Some 
Occupancy Problems.” 

Oscar Wesler, Stanford, major in statistics, “A Modified Minimax Principle.”’ 

John Wesley Wilkinson, North Carolina, major in statistics, ‘‘Analysis of 
Paired Comparison Designs with Incomplete Repetitions.”’ 

Myron Johnson Willis, Purdue, major in mathematical statistics, ‘Exponential 
Regression.” 

J. W. Woll, Jr., Princeton, major in mathematics, ‘‘Homogeneous Stochastic 
Processes.” 


rn 


New Members 


The following persons have been elected to membership in the Institute 
November 8, 1956 to February 5, 1957 


Bacon, Dr. Ralph H., Ph.D. (New York Univ.), Physicist, General Precision Lab., Pleasant- 
ville, New York. 





NEWS AND NOTICES 541 


Brooks, W. Douglas, M.Ed. (Harvard), Research Psychologist, Educational Research 
Corporation, 10 Craigie S8t., Cambridge, Mass., 131 Broadway, Arlington, Massachusetts. 

Camacho Dias, Antonio, Diplomado en Estadistios (Escuela de Estadistics, Madrid), 
Tecnico de la Comision Nacional de Productividad y Colaborador del Instituto de 
Investigaciones Estadisticas, Velazquez 47, Madrid, Spain, Escosure 5, Madrid, Spain. 

Crowson, Henry L., M.S. (Univ. of Florida), Graduate Assistant, Univ. of Florida, P. O. 
Boz 3013 Univ. Station, Gainesville, Florida. 

D’Andrea Du Bois, N.S. Jr., M.S. (Howard Univ.), Grad. Student, Univ. of California, 
Statistical Department, 1407 Hearst St., Berkeley 2, California. 

Draper, Norman R., B.A. (Cambridge, England), Grad. Asst., Statistics Dept., Univ. of 
North Carolina, Chapel Hill, N. C. 

Dunlap, Paul R., M.Ed. (Penn. State Univ.), Sr. Reliability Analyst, RAD-Avco Mfg. 
Corp., 20 So. Union St., Lawrence, Mass., 3 Kingfisher Road, Tewksbury, Massachusetts. 

Farrell, Roger H., M.S. (Univ. of Chicago), Research Asst. Digital Computer Laboratory, 
University of Illinois, Urbana, Illinois, 1022 W. Daniel St., Champaign, Illinois. 

Fleisher, Harold, Ph.D. (Case Inst. of Tech.), Mgr., Research Communications, Inter- 
national Business Machines Corp., P. O. Box 390, Poughkeepsie, New York. 

Fricke, Theresa A., M.S. (Purdue Univ.), Statistician, B. F. Goodrich Chemical Company, 
Experimental Station, Avon Lake, Ohio, 14843 E. St., Lorain, Ohio. 

Gillis, Sister Catherine Josephine, A.M. (Boston Univ.), Asst. Prof. Dept. of Mathe- 
matics, Emmanuel College, 400 The Fenway, Boston 15, Mass. 

Good, I. J., Ph.D. (Cambridge Univ.), Sen. Principal Scientific Officer, Royal Naval Sci- 
entific Service, Government Communications Hdqts., Priors Road, Cheltenham, 
England, 26 Scott House, Princess Elizabeth Way, Cheltenham, England. 

Haley, Lawrence B., M.S. (Alabama Polytechnic Inst.), Mathematical Statistician, Army 
Ballistic Missile Agency, Huntsville, Alabama, 1704 La Grande, Huntsville, Alabama. 

Holmes, D. S., M.S. (Purdue Univ.), Quality Control Engineer, General Electric Co., 
Bldg. 36, 1 River Road, Schenectady 5, New York. 

Jaynes, Lt. William E., Ph.D. (Ohio State Univ.), Res. Psychologist (MSC), Psychology 
Dept. Army Medical Research Laboratory, Fort Knox, Kentucky. 

Jones, N. F., LL.B. (New York Univ.), Asso. Actuary, The Prudential Ins. Co. of America, 
Planning and Development Dept., P.O. Drawer 594, Newark 1, N. J. 

Kudo, Akio, M.Sc. (Tokyo Univ.), Asst., Mathematical Inst., Faculty of Science, Kyushu 
Univ., Fukuoka, Japan. 

Lingappaiah, G. S., M.Sc. (Mysore Univ., India), Senior Lecturer in Stat. (India); Res. 
Asst. (Stanford Univ.); Dept. of Stat., University of Madras, Madras 5, India; Dept. 
of Stat., Stanford Univ., Stanford, Calif., P.O. Box 617, Stanford, California. 

Marshall, Albert W., (member designated by Univ. of Washington), Lab. of Stat. Re- 
search, Dept. of Mathematics, Univ. of Washington, Seattle 5, Washington. 

Moriarity, John J., (member designated by Purdue Univ.), Statistical Lab. Purdue Univ., 
Lafayette, Indiana. 

Ockelmann, Erich, Abteilungsleiter fur Marktforschung (Handlungsvollmacht), Car! 
Gabler, Werbegesellschaft mbH., Munchen 2, Karlsplatz 13, Germany, Furstenrieder- 
strasse 100/III, Munchen 42, Germany. 

Quade, Dana E., (member designated by Univ. of N. C.), Dept. of Statistics, Univ. of 
N.C., Chapel Hill, N. C 

Reich, Edgar, Ph.D. (Univ. of California, Los Angeles), Asst. Professor Math., University 
of Minnesota, Institute of Technology, Minneapolis 14, Minn. 

Seguchi, Tsunetami, M.Sc. (Kyushu Univ.), Res. Worker, Mathematical Institute, Faculty 
of Science, Kyushu Univ., Fukuoka, Japan. 

Stevens, Martin, B.S. (Temple Univ.), Reliability Engineer, General Elec. Co. (MOSD), 
3198 Chestnut Street, Philadelphia 4, Pa., 1744 Hawthorne Ave., Havertown, Pa. 

Temming, Dr. Heinz, doctor rerum naturalium (Univ. Berlin), Technical director, Peter 
Temming Aktiengesellschaft, Gluckstadt near Hamburg, Germany. 





542 NEWS AND NOTICES 


Thionet, Pierre, Professeur ‘agrégé de l’Universite (Paris), Administrateur 4 |’Institut 
National de la Statistique et des Etudes Economiques, & Paris, Prete au Ministare des 
Finances, Service des Etudes Economiques et Financires, 36 rue de Dunkerque, Paris 
(X°) France. 

Washio, Yasutoshi, M.Sc. (Kyushu Univ.), Asst., Mathematical Inst., Faculty of Science, 
Kyushu Univ., Fukuoka, Japan. 

Weckwerth, Vernon E., B.S. (Univ. of Minnesota), Inst. and Res. Fellow, Student, Univ. of 
Minnesota, Minneapolis 14, Minn., 1612 Huron St., St. Paul 13, Minnesota. 

Wheeler, T/Sgt. R. E., U.S. Air Force, AF17261790, Hq. Sq. Sec., Hq. AAC, Box 409, APO 
942, Seattle, Washington. 

Wijsman, Robert A., Ph.D. (Univ. of Calif., Berkeley), Acting Asst. Prof., Statistics De- 
partment, University of California, Berkeley, California. 

Yarborough, (Miss) Leone, B.S. (Oklahoma A. and M. College), Student, Statistical Labo- 
ratory, Oklahoma A. and M. College, Stillwater, Oklahoma. 


CR 


REPORT OF THE WASHINGTON, D. C. MEETING OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


The 1957 Eastern Regional Meeting, seventy-second meeting of the Institute 
of Mathematical Statistics, was held in Washington, D. C., on March 7-9, 
1957, in conjunction with the Biometric Society (Eastern North American 
Region). 

The following 145 members of the Institute registered for the meeting: 


R. L. Anderson, T. W. Anderson, James B. Bartoo, Joseph Berkson, Julius R. Blum, 
R. C. Bose, G. E. P. Box, R.A. Bradley, A. E. Brandt, Ben Buchbinder, Bradley Bucher, 
Glenn L. Burrows, John A. Carpenter, Mavis B. Carroll, Richard L. Carter, Victor Chew, 
K. L. Chung, Joseph L. Ciminera, Willard H. Clatworthy, A. C. Cohen, Jr., Ted Colton, 
Wm. I. Commins, W. 8. Connor, Jerome Cornfield, E. L. Cox, Gertrude Cox, Elliot Cramer, 
Jonas M. Dalton, Willis Davis, Reed Dawson, Besse B. Day, Francis R. DelPriore, Earl 
L. Diamond, Acheson J. Duncan, David B. Duncan, Arthur M. Dutton, Churchill Eisen- 
hart, Henry Eliner, Benjamin Epstein, William B. Fetters, Clarence Fine, Spencer M. 
Free, John E. Freund, Fred Frishman, Donald A. Gardiner, A. E. Garratt, John J. Gart, 
Seymour Geisser, Dorothy M. Gilford, R. Gnanadesikan, Mina H. Gourary, Franklin A. 
Graybill, B. G. Greenberg, Samuel W. Greenhouse, Joseph A. Greenwood, T. N. E. Greville, 
Shanti 8. Gupta, John Gurland, J. 8. Hagan, Max Halperin, Bernard Harris, Boyd Harsh- 
barger, Wassily Hoeffding, W. H. Horton, Harold Hotelling, Wm. G. Howard, David C. 
Hurst, Frederick V. Hurst, Jr., T. A. Jeeves, G. Kallianpur, Leo Katz, Mortimer B. Keats, 
George H. Kennedy, A. W. Kimball, Carl F. Kossack, Charles Kraft, Clyde Y. Kramer, 
Morton Kupperman, Lonnie Lasman, Fred C. Leone, Alfred Lieberman, Gilbert Lieber- 
man, Julius Lieblein, Ardie Lubin, Eugene Lukaes, J. A. McFadden, G. T. McLougblin, 
Clifford J. Maloney, Arthur S. Marthens, Paul Meier, William Mendenhall, Dale M. Mesner, 
Herbert A. Meyer, Donald F. Morrison, Jack Moshman, Mervin Muller, Mary G. Natrella, 
George Nicholson, Gottfried E. Noether, Junjiro Ogawa, Carl R. Ohman, Ingram Olkin, 
K.M. Patwary, John F. Pauls, B. E. Phillips, Lila Knudsen Randolph, Wyman Richardson, 
Donald L. Richter, David Rosenblatt, Harry M. Rosenblatt, Joan R. Rosenblatt, M. Rosen- 
blatt, 8. N. Roy, Jagdish S. Rustagi, Rose Sachs, William H. Sammons, Daniel E. Sands, 
Marvin Schneiderman, Norman Severo, Oliver A. Shaw, Walt R. Simmons, Rosedith Sit- 
greaves, Morris Skibinsky, Romuald Slimak, Jean F. Smolak, Milton Sobel, Paul N. Somer- 
ville, Fannie A. Stinson, H. C. Sweeny, Zen Szatrowski, Robert J. Taylor, Henry Teicher, 





NEWS AND NOTICES 543 


M. E. Terry, Malcolm E. Turner, H. Robert van der Vaart, Irving Weiss, Kathleen White, 
M. B. Wilk, John W. Wilkinson, Evan J. Williams, R. Lowell Wine, Charles W. Wright, 
W. J. Youden, Samuel Zahl, Marvin Zelen. 


The program was as follows: 
THURSDAY, MARCH 7, 1957 


9:45-10:00a.m. Welcome from Catholic University 
10:00—12:00 noon. Sessions A and B 


Session A: Sample Survey Methodology 


Chairman: Boryp HarsHBarGER, Virginia Polytechnic Institute 
Papers: 1. A Sampling Study of Sources of Information for Farm Families in Virginia, 
LowE.u W1nz, Virginia Polytechnic Institute 
2. A New Approach to General Purpose Sampling, Cart Kossack, Purdue 
University 
3. Recent Experiences with Area Sampling for Agricultural Statistics, R. E. 
Vickery, Agricultural Marketing Service 


Session B: Contributed Papers I (I.M.S.) 


Chairman: Joan R. Rosensuiatt, National Bureau of Standards 

Papers: 1. Synchronization of Trajectory Images of Ballistic Missiles and the Timing 
Record of the Ground Telemetry Recording System, Harry P. HARTKEMEIER, 
Stanford University (Introduced by Paul R. Rider) 

. A Further Contribution to the Theory of Systematic Statistics, JuNsino OGawa, 
University of North Carolina 

. Multivariate Analysis of Variance, 8. N. Roy, University of North Carolina 

. Confidence Bounds Associated with Multivariate Analysis of Variance, 8. N. 
Roy anp R. GNANADESHIKAN, University of North Carolina. 

. On Statistics Independent of a Sufficient Statistic, Evan J. Witu1aMs, North 
Carolina State College 

. On a Problem in Abelian Groups and the Construction of Fractionaily Repli- 
cated Designs, R. C. Bose, University of North Carolina anp R. C. Burton, 
National Bureau of Standards 

. The Variance of Zero-crossing Intervals, J. A. McFapven, U.S. Naval Ord- 
nance Laboratory, (Introduced by Gilbert Lieberman) 

. Maximum Likelihood Estimates in a Simple Queue, A. Bruce Ciarxe, Uni- 
versity of Michigan, (By Title) 

. The Joint Distribution of a Set of Sufficient Statistics for the Parameters of a 
Simple Telephone Exchange Model, VAcuav Epvarp Beng, Bell Telephone 
Laboratories, (By Title) 

. An Extension of the Cramér-Rao Inequality, Joan J. Garr, Virginia Poly- 
technic Institute, (By Title) 

. A Limit Theorem and Bounds for an Optional Stopping Probability, Morris 
Sxrpinsky, Michigan State University, (By Title) 

. On the Mathematical Principles Underlying the Theory of the Chi-square Test, 
Junstro Ocawa, University of North Carolina, (By Title) 

. A Limit Theorem of Cramér and Its Generalization, Junyirno Ocawa, Uni- 
versity of North Carolina, (By Title) 

. Some Uses of Quasi-Ranges II, J. T. Cav ann F. C. Leone, Case Institute 
of Technology anp C. W. Torr, Fenn College, Cleveland, Ohio 





544 NEWS AND NOTICES 


1:30-3:00 p.m. Applications of Stochastic Processes 


Chairman: Raupu A. Brap.ey, Virginia Polytechnic Institute 
Papers: 1. Some Estimation Problems in Generalized Harmonic Analysis, J. E. FREuNpD 
AND W. O. Asn, Virginia Polytechnic Institute 
2. Applications of Stochastic Process Theory to Problems in Aeronautics, FRANK- 


LIN W. Diepericu, National Advisory Committee for Aeronautics, (Langley 
Field) 


3:00-4:00 p.m. Tea and Social Hour 
4:00-6:00 p.m. Design of Experiments 


Chairman: D. B. Duncan, University of North Carolina 
Papers: 1. An Industrial Example of Fractional Factorials, W. H. Horton, Westing- 
house Electric Corp. 
. Some Problems in Evolutionary Operation, G. E. P. Box, Princeton Univer- 
sity 
3. Factorial Treatments in Group Divisible Incomplete Block Designs, CiypE 
Y. Kramer, Virginia Polytechnic Institute 
. The Structure of Incidence Matrices of Partially Balanced Incomplete Block 
Designs, D. M. Mesner, Purdue University Center (Ft. Wayne) 


2 


FRIDAY, MARCH 8, 1957 
9:00—11:00 a.m. Electronic Computers 


Chairman: Jack Mosuman, Council for Economic and Industry Research 
Papers: 1. The Use of Generalized Subroutines in Statistical Calculation, Romuaup 
Surmak, Sperry Rand Corp. 
2. Some Uses of the 1.B.M. 650 in Applied Statistics, H. L. Lucas, Princeton 
University 
3. The Mutual Troubles of Statisticians and Digital Computers, Forman 8. 
Acton, Princeton University 
Discussant: Mi.ton E. Terry, Bell Telephone Laboratories 


11:00—-12:00 noon. Methodology in Survey of Smoking Habits 


Chairman: B. G. GREENBERG, University of North Carolina 
Paper: 1. An Investigation on Smoking Habits of Individuals, D. G. Horvitz, G. T. 


Foraport, J. Monroe, J. Fierscuer, A. L. FinkNEr, North Carolina State 
College 


Discussant: JEROME CoRNFIELD, National Institutes of Health 


2:00-4:00 p.m. Contributed Papers (I.M.S. and Biometric Society) 


Chairman: Max Hauperin, National Institutes of Health 
Papers: 1. A Rank Order Test for Trend in Correlated Means, Anpiz Lusin, Walter 
Reed Army Institute of Research, Washington, D. C. 
2. On the Stochastic Structure of Minkowski-Leontief Systems, Davip RosEn- 
BLATT, American University 
3. The Use of Incomplete Block Designs for Asymmetrical Factorial Arrange- 
ments, MarvIN ZELEN, National Bureau of Standards 
. Extension of Some Results Given by Mitra on Statistical Analysis of Cate- 
gorical Data, Eart Diamonp, University of North Carolina 
. Testing of Hypotheses on a Mixture of Variates Some of Which are Continuous 





NEWS AND NOTICES 545 


and the Rest Categorical, 8. N. Roy anp M. D. Movustara, University of 
North Carolina 


}. Generalized Quantal Response in Biological Assay, JonN GuRLAND, Iowa 
State College 

. The Recovery of Intervariety Information, BrapLey D. Bucur, Princeton 
University 

. On the Stochastic Structure of Minkowski-Leontief Systems, II, Davip RosEen- 
BLATT, American University, (By Title) 

. On the Stochastic Structure of Minkowski-Leontief Systems, III, Davin 
RosENnBLaTr, American University, (By Title) 

. On Selecting a Subset which Contains All Populations Better Than a Standard, 
Santi 8S. Gupta anp Mitton SosBEL, Bell Telephone Laboratories 

. On the Relation Between Loss Functions and Significance Levels, H. R. van 
DER Vaart, North Carolina State College 


4:00-6:00 p.m. Recent Developments in Statistics and Probability 


Chairman: CuurcuILL E1sennart, National Bureau of Standards 
Papers: 1. Statistical Inference, Jerome CorRNFIELD, National Institutes of Health 
2. The Efficiency of Non-parametric Tests, GottFRIED NOETHER, Boston Uni- 
versity 
3. A Survey of the Theory of Analytic Characteristic Functions, EuGENE LuxKacs, 
Catholic’ University 
. Tests Concerning the Means of Certain Distributions, NoRMAN SEVERO, 
National Bureau of Standards 


SATURDAY, MARCH 9, 1957 
10:00—12:00 noon. Stochastic Problems in Physics (I.M.S.) 


Chairman: Kar Lat Cuune, University of Chicago 
Papers: 1. Applications of Stochastic Processes to Problems in Chemical Kinetics, 
E..tiot Montron, University of Maryland 
2. Statistical Theory of Cascade Processes, J. E. Moya, Columbia University 


Dorotuy Morrow GILFORD 
Associate Secretary 


ee II 


PUBLICATIONS RECEIVED 


Dias, M. A., The United States Capital Position and the Structure of Its Foreign Trade (Con- 
tributions to Economic Analysis), North-Holland Publishing Company, Amsterdam, 
P.O. Box 103, Holland, 1956, viii + 67 pp., $2.75. 

Criatwortuy, W. H., Contributions on Partially Balanced Incomplete Block Designs with 
Two Associate Classes, National Bureau of Standards Applied Mathematics Series 47, 
70 pages, 7 tables, 45 cents. (Order from the Superintendent of Documents, Govern- 
ment Printing Office, Washington 25, D. C.) 

NarastmuHaM, N. V. A., A Short Term Planning Model for India (Contributions to Economic 
Analysis), North-Holland Publishing Company, Amsterdam, P.O. Box 103, Holland, 
1956, xiii + 93 pp., $2.75. 

TINBERGEN, J., Economic Policy: Principles and Design (Contributions to Economic Analy- 
sis), North-Holland Publishing Company, Amsterdam, P. O. Box 103, Holland, 1956, 
XXviii + 276 pp., $7.00. 








TRABAJOS DE ESTADISTICA 


Review published by ‘‘Instituto de Investigaciones Estadisticas’ of the ‘‘Consejo 
Superior de Investagaciones Cient{ficas.’’ Madrid, Spain. 


Vol. VIII CONTENTS Cuaderno I 


E. CANSADO... Sampling without replacement from finite populations. 
M. J. BecKMAN A demand curve for luxuries. 
D. E. Barten.......The modality of Neyman’s contagious distribution of type A. 
B. M. BENNETT..... oe .......-Note on the method of inverse sampling. 
NOTAS 

R. Pro .eseeeeees..... Sobre el ensayo de la vacuna del Dr. Salk de 1.954. 


U. Nieto pe ALBA.. Elaboracion de un modelo biométrico, su interpretacion 
estadistica y aplicaciones actuariales. 


Crénicas. Bibliografia. 


For everything in connection with works, exchanges and subscription write to Professor Sixto Rios, Instituto 
de Investigaciones Estadisticas of the Consejo Superior de Investigaciones Cientificas (Serrano, 123). Madrid 

Spain. The Review is composed of three fascicles published three times a year (about 350 pages), and its annual 
price is 100 pesetas for Spain and South America and $4.00 U.S.A. for all other countries. 





ESTADISTICA 


Journal of the Inter American Statistical Institute 


Volume XIV, No. 53 Contents December 1956 
ARTICLES 

La Integracién Estadistica de Grupos de Paises Omar DENGo O. 
The Interpretation of Origin Statistics....... N. B. Rrper 
The SIC from the Viewpoint of the Census Bureau...........RoBpert W. BurGess 


Programacién del] Desarrollo Estadistico Nacional (traduccién) 
OFICINA DE EsTADISTICA DE LAS NACIONES UNIDAS 


What Concepts Are Appropriate to Consumer Price Indexes?....Irnvinc H. Sreceu 


Observaciones sobre la Utilizacién del Registro Civil como Fuente Bdsica para las 
Estadisticas Vitales Car os A. URIARTE 


El Papel de la Estadistica en la Administracién Hospitalaria (traduccién) 
Havsert L. Dunn 


Special Features. Legal Provisions. International Resolutions relating to Statistics. 
Institute Affairs. Statistical News. Publications. 


Published quarterly Annual subscription price $3.00 (U. S.) 


INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D.C. 





BIOMETRIKA 


Volume 44 Contents Parts 1 and 2, June 1957 


Pearson, E.S. John Wishart, 1898-1956. Obituary and Bibliography. Arsrracsz,P. Restricted sequential 
procedures. Bartiett, M. 8. On theoretical models for competitive and predatory biological systems. 
Pati, V. T. The consistency and adequacy of the Poisson-Markoff model for density fluctuations. Hawn- 
NAN, E. J. Testing for serial correlation in least squares regression. KuLuBack, S. anp Rosensuatt, H. M. 
On the analysis of multiple regression in k categories. Brown, R. L. Bivariate structural relation. Wix- 
Inson, J.W. An analysis of paired comparison designs with incomplete repetitions. Mattows,C.L. Non- 
null ranking models. I. Arrcnison, J. anp Sirvey, 8. D. The generalization of probit analysis to the case 
of multiple responses. Pearce, 8. C. Experimenting with organisms as blocks. Cox, D. R. The use of 
a concomitant variable in selecting an experimental design. Butmer,M.G. Approximate confidence limits 
for components of variance. Barton, D. E. anp Davip, F.N. Maultipleruns. Livptzy, D. V. Binomial 
sampling schemes and the concept of information. Linpirey, D. V. A statistical paradox. Hasxey, H. W. 
Stochastic cross-infection between two otherwise isolated groups. Mackenzig, J. K. anp THomson, M. J. 
Some statistics associated with the random disorientation of cubes. Darwin, J.H. The difference between 
consecutive members of a series of random variables arranged in order of size. Hartey, B. I. Relation 
between the distributions of non-central ¢ and of a transformed correlation coefficient. Conmn, A. CLiIrrorp, 
Jr. On the solution of estimating equations for truncated and censored samples from normal populations. 
Foster, F. G. anp Rees, D. H. Upper percentage points of the generalized beta distribution. I. 
Miscellanea—Contributions by D. J. Barrnotomew, M. 8. Bartietr, H. A. Davin, J. H. Darwin, F. A. 
Grays AND J. L. Foixs, J. Gurtanp, B. I. Hartey ann E. 8. Pearson, N. L. Jonnson, M. G. Kenpaun 
H. O. Lancaster, E. 8. Pace, M. H. Quenovumte, G. P. Smurto, A. Stuart, A. WINTNER. 


Corrigenda—I. J. Goon Reviews Other Books Received 


The subscription price, payable in advance, is 45s. inland, 54s. export ( per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary. Biometrika Office. Department of Statistics, 
University College. London. W.C. 1." All foreign cheques must be in sterling and drawn on a bank 
having a London agency 





A New Colloquium Publication 


VOLUME XXXVII 
STRUCTURE OF RINGS 
by NATHAN JACOBSON 


In his preface, the author points out that a number of important 
developments have recently taken place in the theory of non- 
commutative rings. These include the structure theory of rings 
without finiteness assumptions, cohomology of algebras, and 
structure and representation theory of non-semi-simple rings. 
The main purpose of this volume is to give an account of the 
first of these developments. 

$7.70 263 pages 

25% discount to members of the Society 


Order from 


AMERICAN MATHEMATICAL SOCIETY 
190 Hope Street, Providence 6, R.I. 





ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 25, No. 2 = April, 1957 


V. N. Mort anp V. K. Sastry Protenios Functions for Indian Industry 

Epwin 8. Mus heory of Inventory Decisions 

Ropert A. BanpgEn Automobile Consumption, 1940-1950 

Ear O. Hzapr....An Econemetric Investigation of the Technology of Agricultural Production Functions 

Ropert H. Srrorz mpirical Implications of a Utility Tree 

Vernon L. Surrn Engineering Data and Statistical Sochesenees in the Analysis of Production and Tech- 

nological Change: Fuel yyy in the eens Industry 

Juuan H. Buav : Laine The qemaase of Social Welfare Functions 

Hans Brems.... '. Employment and Money —— under Balanced Foreign Trade 

Davip RosensBiatTr..... On nee — and the Gene! of Minkowski-Leontief Matrices 

Sranvey Reirer ; ; A Note on Surrogates for Uncertain Decision Problems 

Henna Tuer A Note on C ‘ertainty Equivalence in Dynamic Planning 

REPORT OF THE SEATTLE MEETING 

REPORT OF THE DETROIT MEETING 

Boox Revizews 

A Study of Saving in the U. S., Vol. III (Raymond W. Goldsmith, Dorothy 8. Brady, and Horst Menders- 
hausen). Review by L. R. Klein 

The Cost of the National Health Service in England and Wales (Brian Abel-Smith and R. M. Titmuss). Review 
by Jerome Rothenberg. 

Structures et cycles économiques (Johan Akerman). Review by Henri Guitton 

Distribution’s Place in the American Economy since 1869 (Harold Barger). Review by Robert Ferber 

British Incomes and Savings (H. F. Lydall). Review by Raymond W. Goldsmith. 

The Industrial Mobility of Labor as a Probability Process (1. Blumen, M. Kogan, and P. J. McCarthy). Review 
by Thomas A. a 

The National Income of Hong Kong, 1947-60 (R. A. Ma and Edward F.. Szczepanik). Review by Phyllis Deane 

International Bibliography of Economics, Vols. I and Il. Review by Otto H. Ehrlich 

Resource Productivit Returns to Scale, and Farm Size (H. O. Heady, G. L. Johnson, and L. 8. Hardin, eds.). 
Review by F. V. 

Einfahrung in die = eiahs Logik mit besonderer Beriicksichtigung ihrer Anwendungen (R. Carmap). Review 
by Eberhard Fels. 

Urban Mortgage Lending (J. E. Morton). Review by. Ben B. Sutton 

la —— Economica Sovietica; F ormazionse-Tendenze (Giorgio Toletto). Review by Michele De Bene- 


dict 
Siastnaeite of Economics (Ichiro Nakayama, ed.). Review by Hiroshi Takeda 





JOURNAL OF THE 


ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 


Vol. XVIII, No. 2, 1956 


Generalizations of Tchebycheff’s Inequalities C. L. Matiows. (With Discussion). 
Some Statistical Problems in Experimental many: Vioter R. Cane. (With Discussion). 
Generalized Hypergeometric Distributions ; ‘ C. D. Kemp anp A. W. Kemp. 
New Tables of Behrens’ Test of Significance . ..... R. A, Fisner ann M. J. R. Heaty. 


Fiducial Distributions and Prior Distributions: An Example in which the Former cannot be associated with 
the Latter P. M. Grunpy. 


Some Methods of Estimating the Parameters of Discrete Heterogeneous Populations J. E. Frevnp. 
The Asymptotic Powers of Certain Tests based on Multiple Correlations. E. J. Hannan. 
Teste for Randomness in a Series of Events when the Alternative is a Trend.... D. J. BartHotomew. 
Regression Analysis of Relationships between Autocorrelated Time Series J. Wise. 
Expected Arc Length of a Gaussian Process on a Finite Interval... Irwin Mruuer anp Joun E. Frewnp. 
Analysis of Dispersion with Incomplete Observations on One of the Characters... .. ; : Cc. R. Rao. 
On Limiting Distributions Arising in Bulk Service Queues......... F. Downron. 
Some Equilibrium Results for the wae Process ‘oy M/1.... R. R. 'P. Jackson anv D. G. Nickxous 
On Machine Iaterference . . P. Naor. 
Note on an Article by Sir Ronald Fisher. Jerzy NeYMAN. 
Comment oa Sir Ronald Fisher’s Paper: “On a Test of Significance in Pearson’s Biometrika Tables (No. 11)’’. 


M. 8. Bartuerr. 
Note on some Criticisms made by Sir Ronald Fisher .... ~eecaiiek B. L. Wetcn. 


The Royal Statistical Society, 21, Bentinck Street, London, W. 1 





SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 17, Part 4, 1956 


Some observations on input-output analysis... Oskar LANGE 
The use of short-term Econometric model for Indian Economic policy J. TINBERGEN 
Approximate distribution of certain linear function of order statistics K. C. Sean 
On the unboundness of infinitely divisible laws 8. D. Cuarrerser anv R. P. PAKsutmasan 
A note on the orthogonal latin squares NIKHILESH BHATTACHARYYA 
A new discrete distribution Ayopuya Prapap 
An experimenta] method for obtaining random digits and per mutations Joun E. Waisn 
On estimating parametric functions in startified sampling designs Des Rag 
A note on variance components in multistage sampling with venive i Gena os J. Ror 
A note on two stage sampling ne RANGARAJAN 
Method of matching used for the estimation of test reliability . ‘Pp. K. Bose AND > 8. B. Rorcnoupuvcry 
Recommendations for personnel selection in India based on the British selection methods in civil service and 


Industry pulhaee a Ruea 8. Das 
Isolation of some morale dimensions by factor ‘analysis ‘ H. C. Ganev 


Inversion of 25 x 25 matrix on a 602A calculating punch ‘ ; _D. Boss anp A. Roy 


ANNUAL Susscription: 30 rupees ($10.00), 10 rupees ($3.50) per issue. 
Back NumsBers: 45 rupees ($15.00) per volume; 12/8 rupees ($4.50) per issue 
Subscriptions and orders for back numbers should be sent to 
STATISTICAL PUBLISHING SOCIETY 
204/1 Barrackpore Trunk Road Calcutta 35, India 








SKANDINAVISK 


AKTUARIETIDSKRIFT 


1956 - Parts 1 - 2 


Contents 


8. NORDBOTTEN Allocation in Stratified Sampling by Means of Linear P mming 
Pu. G. Carson A Least Squares Interpretation of the Bivariate Line of Organic lation 
B. M,. Bennett On a Rank-Order Test for the Equality of Probability of an Event 
P. G. Moore The Transformation of a Truncated Poisson Distribution 
C. Putirson A Note on Different Models of Stochastic Processes Dealt with in the Collective aS 


H. L. Sear. Erratum 
J. E., Wausn Actuarial Validity of the Binomial Distribution for Large Numbers of Lives with Small 


Mortality Probabilities 

Estimating Population Mean, Variance, and Percentage Points from Truncated Data 
On a Characterization of the Normal Distribution 

Explicit Expressions for the First Four Moments of a Truncated Distribution defined by 


Pearson Type VI 
U. GRENANDER On the Theory of Mortality Measurement. Part I 
H., E. Sre.soNn "Laplace Transforms Applied to Interest Functions Eksamen i Forsikringsvidenskab og 


Statistik ved Kgbenhavns Universitet 
Oversikt av utlindska aktuarietidskrifter 


De skandinaviske aktuarforeningers virksomhed i 1955 
De nordiska aktuarieféreningarnas pris fér perioden 1948-1955 


Annual subscription: $5.00 per year 
Inquiries and orders may be addressed to the Editor 


GRANHALLSVAGEN 35, STOCKSUND, SWEDEN 





